Training: 2022-04-10 23:14:04,187-rank_id: 0 Training: 2022-04-10 23:14:18,205-: margin_list [1.0, 0.5, 0.0] Training: 2022-04-10 23:14:18,205-: network mbf Training: 2022-04-10 23:14:18,205-: resume False Training: 2022-04-10 23:14:18,205-: output work_dirs/ms1mv2_mbf Training: 2022-04-10 23:14:18,205-: embedding_size 512 Training: 2022-04-10 23:14:18,205-: sample_rate 1.0 Training: 2022-04-10 23:14:18,205-: interclass_filtering_threshold0 Training: 2022-04-10 23:14:18,205-: fp16 True Training: 2022-04-10 23:14:18,205-: batch_size 128 Training: 2022-04-10 23:14:18,206-: optimizer sgd Training: 2022-04-10 23:14:18,206-: lr 0.1 Training: 2022-04-10 23:14:18,206-: momentum 0.9 Training: 2022-04-10 23:14:18,206-: weight_decay 0.0001 Training: 2022-04-10 23:14:18,206-: verbose 2000 Training: 2022-04-10 23:14:18,208-: frequent 10 Training: 2022-04-10 23:14:18,208-: dali False Training: 2022-04-10 23:14:18,208-: rec /train_tmp/faces_emore Training: 2022-04-10 23:14:18,208-: num_classes 85742 Training: 2022-04-10 23:14:18,208-: num_image 5822653 Training: 2022-04-10 23:14:18,208-: num_epoch 40 Training: 2022-04-10 23:14:18,208-: warmup_epoch 0 Training: 2022-04-10 23:14:18,208-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-10 23:14:18,210-: total_batch_size 1024 Training: 2022-04-10 23:14:18,210-: warmup_step 0 Training: 2022-04-10 23:14:18,210-: total_step 227440 Training: 2022-04-10 23:15:26,699-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-10 23:15:28,458-Speed 10543.27 samples/sec Loss 46.0198 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 4096 Required: 37 hours Training: 2022-04-10 23:15:29,424-Speed 10602.88 samples/sec Loss 46.2574 LearningRate 0.1000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 4096 Required: 27 hours Training: 2022-04-10 23:15:30,385-Speed 10663.85 samples/sec Loss 46.3898 LearningRate 0.1000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 4096 Required: 22 hours Training: 2022-04-10 23:15:31,327-Speed 10878.69 samples/sec Loss 46.8636 LearningRate 0.1000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 4096 Required: 19 hours Training: 2022-04-10 23:15:32,303-Speed 10500.90 samples/sec Loss 47.2565 LearningRate 0.0999 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 4096 Required: 17 hours Training: 2022-04-10 23:15:33,257-Speed 10742.10 samples/sec Loss 46.9308 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 4096 Required: 15 hours Training: 2022-04-10 23:15:34,173-Speed 11194.52 samples/sec Loss 47.2892 LearningRate 0.0999 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 4096 Required: 14 hours Training: 2022-04-10 23:15:35,112-Speed 10906.93 samples/sec Loss 47.0307 LearningRate 0.0999 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 4096 Required: 13 hours Training: 2022-04-10 23:15:36,045-Speed 10980.51 samples/sec Loss 46.9070 LearningRate 0.0999 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 4096 Required: 12 hours Training: 2022-04-10 23:15:37,054-Speed 10152.49 samples/sec Loss 47.0158 LearningRate 0.0999 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-04-10 23:15:38,011-Speed 10707.21 samples/sec Loss 46.8671 LearningRate 0.0999 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-10 23:15:38,974-Speed 10648.53 samples/sec Loss 46.8866 LearningRate 0.0999 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-10 23:15:39,954-Speed 10451.27 samples/sec Loss 46.7577 LearningRate 0.0999 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-04-10 23:15:40,889-Speed 10963.09 samples/sec Loss 46.7778 LearningRate 0.0999 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-10 23:15:41,819-Speed 11022.17 samples/sec Loss 46.5865 LearningRate 0.0999 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-10 23:15:42,755-Speed 10949.23 samples/sec Loss 46.5775 LearningRate 0.0999 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-10 23:15:43,697-Speed 10878.42 samples/sec Loss 46.4093 LearningRate 0.0998 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-04-10 23:15:44,648-Speed 10774.58 samples/sec Loss 46.1201 LearningRate 0.0998 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-10 23:15:45,589-Speed 10887.39 samples/sec Loss 46.0862 LearningRate 0.0998 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-04-10 23:15:46,506-Speed 11184.66 samples/sec Loss 45.9573 LearningRate 0.0998 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-10 23:15:47,437-Speed 10998.45 samples/sec Loss 45.8031 LearningRate 0.0998 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-10 23:15:48,368-Speed 11007.89 samples/sec Loss 45.8436 LearningRate 0.0998 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-10 23:15:49,320-Speed 10763.67 samples/sec Loss 45.5999 LearningRate 0.0998 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-10 23:15:50,283-Speed 10640.24 samples/sec Loss 45.4648 LearningRate 0.0998 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-10 23:15:51,263-Speed 10463.35 samples/sec Loss 45.3852 LearningRate 0.0998 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-10 23:15:52,236-Speed 10528.29 samples/sec Loss 45.3099 LearningRate 0.0998 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-10 23:15:53,163-Speed 11058.72 samples/sec Loss 45.1063 LearningRate 0.0998 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-10 23:15:54,123-Speed 10675.12 samples/sec Loss 45.0613 LearningRate 0.0997 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-10 23:15:55,104-Speed 10442.21 samples/sec Loss 44.9033 LearningRate 0.0997 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-10 23:15:56,052-Speed 10807.64 samples/sec Loss 44.8152 LearningRate 0.0997 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:15:56,964-Speed 11241.83 samples/sec Loss 44.6596 LearningRate 0.0997 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:15:57,850-Speed 11562.14 samples/sec Loss 44.5656 LearningRate 0.0997 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:15:58,787-Speed 10939.91 samples/sec Loss 44.5126 LearningRate 0.0997 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:15:59,701-Speed 11215.89 samples/sec Loss 44.4524 LearningRate 0.0997 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:16:00,633-Speed 10994.20 samples/sec Loss 44.3111 LearningRate 0.0997 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:16:01,597-Speed 10619.71 samples/sec Loss 44.2873 LearningRate 0.0997 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:16:02,526-Speed 11030.07 samples/sec Loss 44.1827 LearningRate 0.0997 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:16:03,455-Speed 11037.67 samples/sec Loss 44.0019 LearningRate 0.0997 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:16:04,405-Speed 10783.96 samples/sec Loss 43.9439 LearningRate 0.0996 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-10 23:16:05,379-Speed 10518.96 samples/sec Loss 43.8849 LearningRate 0.0996 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:16:06,331-Speed 10767.05 samples/sec Loss 43.8769 LearningRate 0.0996 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:16:07,286-Speed 10726.57 samples/sec Loss 43.7174 LearningRate 0.0996 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:08,198-Speed 11249.95 samples/sec Loss 43.6118 LearningRate 0.0996 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:09,137-Speed 10908.93 samples/sec Loss 43.5380 LearningRate 0.0996 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:10,053-Speed 11192.96 samples/sec Loss 43.5015 LearningRate 0.0996 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:10,950-Speed 11424.79 samples/sec Loss 43.4218 LearningRate 0.0996 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:11,883-Speed 10989.18 samples/sec Loss 43.3084 LearningRate 0.0996 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:12,826-Speed 10859.88 samples/sec Loss 43.1210 LearningRate 0.0996 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:13,779-Speed 10759.42 samples/sec Loss 43.1444 LearningRate 0.0996 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:14,716-Speed 10930.60 samples/sec Loss 43.1206 LearningRate 0.0996 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:15,658-Speed 10873.98 samples/sec Loss 43.0411 LearningRate 0.0995 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:16,626-Speed 10590.84 samples/sec Loss 42.8270 LearningRate 0.0995 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:17,609-Speed 10427.56 samples/sec Loss 42.7960 LearningRate 0.0995 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:18,581-Speed 10531.14 samples/sec Loss 42.6310 LearningRate 0.0995 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:19,534-Speed 10757.27 samples/sec Loss 42.6029 LearningRate 0.0995 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:20,488-Speed 10737.82 samples/sec Loss 42.4651 LearningRate 0.0995 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:21,418-Speed 11022.54 samples/sec Loss 42.4277 LearningRate 0.0995 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:22,379-Speed 10661.80 samples/sec Loss 42.2484 LearningRate 0.0995 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:23,300-Speed 11124.32 samples/sec Loss 42.1756 LearningRate 0.0995 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-10 23:16:24,233-Speed 10990.16 samples/sec Loss 42.1469 LearningRate 0.0995 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:25,207-Speed 10513.23 samples/sec Loss 42.0039 LearningRate 0.0995 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:26,172-Speed 10619.15 samples/sec Loss 41.8657 LearningRate 0.0994 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:27,119-Speed 10826.24 samples/sec Loss 41.8719 LearningRate 0.0994 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:28,105-Speed 10393.68 samples/sec Loss 41.8239 LearningRate 0.0994 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:29,082-Speed 10489.79 samples/sec Loss 41.5241 LearningRate 0.0994 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:30,037-Speed 10729.14 samples/sec Loss 41.4833 LearningRate 0.0994 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:31,005-Speed 10586.38 samples/sec Loss 41.4201 LearningRate 0.0994 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:31,937-Speed 10998.98 samples/sec Loss 41.3406 LearningRate 0.0994 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:32,863-Speed 11056.73 samples/sec Loss 41.1557 LearningRate 0.0994 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:33,805-Speed 10875.95 samples/sec Loss 41.0744 LearningRate 0.0994 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:34,706-Speed 11378.46 samples/sec Loss 41.0317 LearningRate 0.0994 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:35,665-Speed 10682.03 samples/sec Loss 40.9912 LearningRate 0.0994 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:36,628-Speed 10640.62 samples/sec Loss 40.8473 LearningRate 0.0994 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:37,565-Speed 10940.96 samples/sec Loss 40.7847 LearningRate 0.0993 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:38,496-Speed 11007.28 samples/sec Loss 40.5621 LearningRate 0.0993 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:39,412-Speed 11179.73 samples/sec Loss 40.5647 LearningRate 0.0993 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:40,375-Speed 10646.68 samples/sec Loss 40.4136 LearningRate 0.0993 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:41,331-Speed 10719.84 samples/sec Loss 40.3909 LearningRate 0.0993 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:42,280-Speed 10795.86 samples/sec Loss 40.1544 LearningRate 0.0993 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:43,230-Speed 10788.51 samples/sec Loss 40.0934 LearningRate 0.0993 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:16:44,140-Speed 11269.10 samples/sec Loss 40.0757 LearningRate 0.0993 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:45,031-Speed 11504.00 samples/sec Loss 39.9811 LearningRate 0.0993 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:45,978-Speed 10819.23 samples/sec Loss 39.7942 LearningRate 0.0993 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:46,900-Speed 11113.23 samples/sec Loss 39.7634 LearningRate 0.0993 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:47,847-Speed 10818.27 samples/sec Loss 39.5683 LearningRate 0.0992 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:48,757-Speed 11261.58 samples/sec Loss 39.5379 LearningRate 0.0992 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:49,666-Speed 11268.35 samples/sec Loss 39.3776 LearningRate 0.0992 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:50,582-Speed 11188.16 samples/sec Loss 39.4039 LearningRate 0.0992 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:51,508-Speed 11068.42 samples/sec Loss 39.2147 LearningRate 0.0992 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:52,467-Speed 10689.02 samples/sec Loss 39.1451 LearningRate 0.0992 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:53,395-Speed 11035.61 samples/sec Loss 39.0101 LearningRate 0.0992 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:16:54,317-Speed 11111.37 samples/sec Loss 38.9232 LearningRate 0.0992 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:55,263-Speed 10841.83 samples/sec Loss 38.8492 LearningRate 0.0992 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:56,206-Speed 10865.88 samples/sec Loss 38.7692 LearningRate 0.0992 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:57,142-Speed 10942.49 samples/sec Loss 38.6193 LearningRate 0.0992 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:58,067-Speed 11074.61 samples/sec Loss 38.5540 LearningRate 0.0991 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:16:58,983-Speed 11196.19 samples/sec Loss 38.3976 LearningRate 0.0991 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:16:59,917-Speed 10966.64 samples/sec Loss 38.4129 LearningRate 0.0991 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:00,835-Speed 11161.58 samples/sec Loss 38.1844 LearningRate 0.0991 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:01,804-Speed 10580.73 samples/sec Loss 38.1530 LearningRate 0.0991 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:02,734-Speed 11018.26 samples/sec Loss 38.0094 LearningRate 0.0991 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:03,709-Speed 10509.46 samples/sec Loss 38.0176 LearningRate 0.0991 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:04,646-Speed 10930.90 samples/sec Loss 37.8222 LearningRate 0.0991 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:05,587-Speed 10900.26 samples/sec Loss 37.6907 LearningRate 0.0991 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:06,509-Speed 11113.32 samples/sec Loss 37.5986 LearningRate 0.0991 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:07,486-Speed 10490.10 samples/sec Loss 37.5011 LearningRate 0.0991 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:17:08,437-Speed 10768.53 samples/sec Loss 37.3912 LearningRate 0.0991 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:17:09,363-Speed 11068.94 samples/sec Loss 37.2082 LearningRate 0.0990 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:17:10,276-Speed 11239.77 samples/sec Loss 37.1829 LearningRate 0.0990 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:17:11,178-Speed 11356.77 samples/sec Loss 37.0756 LearningRate 0.0990 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:17:12,084-Speed 11307.93 samples/sec Loss 36.9488 LearningRate 0.0990 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:17:13,027-Speed 10872.33 samples/sec Loss 36.9109 LearningRate 0.0990 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-10 23:17:13,976-Speed 10793.33 samples/sec Loss 36.8492 LearningRate 0.0990 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-10 23:17:14,884-Speed 11289.70 samples/sec Loss 36.6199 LearningRate 0.0990 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-10 23:17:15,825-Speed 10887.33 samples/sec Loss 36.4547 LearningRate 0.0990 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-10 23:17:16,772-Speed 10818.20 samples/sec Loss 36.4893 LearningRate 0.0990 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-10 23:17:17,709-Speed 10932.81 samples/sec Loss 36.3526 LearningRate 0.0990 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:18,660-Speed 10773.14 samples/sec Loss 36.1563 LearningRate 0.0990 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:19,590-Speed 11016.47 samples/sec Loss 36.1767 LearningRate 0.0989 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:20,534-Speed 10864.16 samples/sec Loss 35.9623 LearningRate 0.0989 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:21,425-Speed 11494.31 samples/sec Loss 35.9823 LearningRate 0.0989 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:22,352-Speed 11050.77 samples/sec Loss 35.7949 LearningRate 0.0989 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:23,306-Speed 10744.25 samples/sec Loss 35.7284 LearningRate 0.0989 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:24,214-Speed 11292.54 samples/sec Loss 35.5329 LearningRate 0.0989 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:25,179-Speed 10614.69 samples/sec Loss 35.5171 LearningRate 0.0989 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:26,107-Speed 11041.14 samples/sec Loss 35.2939 LearningRate 0.0989 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:27,077-Speed 10579.25 samples/sec Loss 35.1477 LearningRate 0.0989 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:28,034-Speed 10708.21 samples/sec Loss 35.1316 LearningRate 0.0989 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:28,979-Speed 10844.24 samples/sec Loss 35.0547 LearningRate 0.0989 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:29,918-Speed 10916.32 samples/sec Loss 34.7894 LearningRate 0.0989 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:30,848-Speed 11016.36 samples/sec Loss 34.7167 LearningRate 0.0988 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:31,768-Speed 11145.06 samples/sec Loss 34.6510 LearningRate 0.0988 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:32,705-Speed 10933.28 samples/sec Loss 34.5987 LearningRate 0.0988 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:33,640-Speed 10963.45 samples/sec Loss 34.4396 LearningRate 0.0988 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:34,561-Speed 11121.97 samples/sec Loss 34.3139 LearningRate 0.0988 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:35,474-Speed 11227.72 samples/sec Loss 34.1571 LearningRate 0.0988 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:36,404-Speed 11020.95 samples/sec Loss 34.1375 LearningRate 0.0988 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:37,306-Speed 11357.96 samples/sec Loss 34.0777 LearningRate 0.0988 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:38,253-Speed 10821.70 samples/sec Loss 33.9037 LearningRate 0.0988 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:39,176-Speed 11100.79 samples/sec Loss 33.7162 LearningRate 0.0988 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:40,103-Speed 11046.89 samples/sec Loss 33.6661 LearningRate 0.0988 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:41,047-Speed 10857.67 samples/sec Loss 33.6121 LearningRate 0.0987 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:41,965-Speed 11162.68 samples/sec Loss 33.4231 LearningRate 0.0987 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:42,925-Speed 10675.35 samples/sec Loss 33.3826 LearningRate 0.0987 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:43,875-Speed 10781.86 samples/sec Loss 33.3026 LearningRate 0.0987 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:44,789-Speed 11214.37 samples/sec Loss 33.0571 LearningRate 0.0987 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:45,710-Speed 11131.10 samples/sec Loss 32.9768 LearningRate 0.0987 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:46,639-Speed 11019.25 samples/sec Loss 33.0197 LearningRate 0.0987 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:47,602-Speed 10650.53 samples/sec Loss 32.7610 LearningRate 0.0987 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:48,550-Speed 10801.48 samples/sec Loss 32.6713 LearningRate 0.0987 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:49,544-Speed 10311.19 samples/sec Loss 32.5735 LearningRate 0.0987 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:50,485-Speed 10899.15 samples/sec Loss 32.4923 LearningRate 0.0987 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:51,433-Speed 10808.31 samples/sec Loss 32.3009 LearningRate 0.0987 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:52,393-Speed 10669.81 samples/sec Loss 32.2197 LearningRate 0.0986 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:53,354-Speed 10665.25 samples/sec Loss 32.0972 LearningRate 0.0986 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:54,292-Speed 10925.23 samples/sec Loss 31.8310 LearningRate 0.0986 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:55,224-Speed 10985.97 samples/sec Loss 32.0763 LearningRate 0.0986 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:56,155-Speed 11031.10 samples/sec Loss 31.7812 LearningRate 0.0986 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:17:57,093-Speed 10922.05 samples/sec Loss 31.7189 LearningRate 0.0986 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:58,014-Speed 11118.62 samples/sec Loss 31.6706 LearningRate 0.0986 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:58,923-Speed 11276.34 samples/sec Loss 31.4949 LearningRate 0.0986 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:17:59,833-Speed 11259.82 samples/sec Loss 31.4687 LearningRate 0.0986 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:00,771-Speed 10929.82 samples/sec Loss 31.2289 LearningRate 0.0986 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:01,717-Speed 10829.68 samples/sec Loss 31.2877 LearningRate 0.0986 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:02,658-Speed 10887.38 samples/sec Loss 31.0178 LearningRate 0.0985 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:03,636-Speed 10472.17 samples/sec Loss 31.0734 LearningRate 0.0985 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:04,580-Speed 10853.13 samples/sec Loss 30.7767 LearningRate 0.0985 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:05,523-Speed 10869.93 samples/sec Loss 30.7526 LearningRate 0.0985 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:06,447-Speed 11096.25 samples/sec Loss 30.5958 LearningRate 0.0985 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:07,383-Speed 10937.22 samples/sec Loss 30.5789 LearningRate 0.0985 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:08,354-Speed 10559.35 samples/sec Loss 30.5017 LearningRate 0.0985 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:09,284-Speed 11019.13 samples/sec Loss 30.4033 LearningRate 0.0985 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:10,201-Speed 11169.80 samples/sec Loss 30.1204 LearningRate 0.0985 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:11,106-Speed 11322.57 samples/sec Loss 30.2344 LearningRate 0.0985 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:12,049-Speed 10872.16 samples/sec Loss 29.9923 LearningRate 0.0985 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:13,039-Speed 10348.84 samples/sec Loss 29.9632 LearningRate 0.0984 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:14,011-Speed 10547.92 samples/sec Loss 29.8454 LearningRate 0.0984 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:14,928-Speed 11173.17 samples/sec Loss 29.7790 LearningRate 0.0984 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:15,874-Speed 10825.19 samples/sec Loss 29.6317 LearningRate 0.0984 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:16,845-Speed 10559.48 samples/sec Loss 29.6274 LearningRate 0.0984 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:17,762-Speed 11175.68 samples/sec Loss 29.4018 LearningRate 0.0984 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:18,696-Speed 10963.73 samples/sec Loss 29.4098 LearningRate 0.0984 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:19,610-Speed 11219.79 samples/sec Loss 29.1282 LearningRate 0.0984 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:20,551-Speed 10884.61 samples/sec Loss 29.0880 LearningRate 0.0984 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:21,442-Speed 11502.78 samples/sec Loss 29.0271 LearningRate 0.0984 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:22,369-Speed 11051.03 samples/sec Loss 29.0078 LearningRate 0.0984 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:23,284-Speed 11199.12 samples/sec Loss 28.7544 LearningRate 0.0984 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:24,205-Speed 11127.04 samples/sec Loss 28.5803 LearningRate 0.0983 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:25,138-Speed 10986.65 samples/sec Loss 28.6463 LearningRate 0.0983 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:26,084-Speed 10835.99 samples/sec Loss 28.4154 LearningRate 0.0983 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:27,014-Speed 11012.21 samples/sec Loss 28.5000 LearningRate 0.0983 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:27,931-Speed 11182.13 samples/sec Loss 28.4204 LearningRate 0.0983 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:28,894-Speed 10634.56 samples/sec Loss 28.1706 LearningRate 0.0983 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:29,833-Speed 10917.37 samples/sec Loss 28.0983 LearningRate 0.0983 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:30,826-Speed 10319.37 samples/sec Loss 28.1369 LearningRate 0.0983 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:31,788-Speed 10656.23 samples/sec Loss 27.9173 LearningRate 0.0983 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-10 23:18:32,696-Speed 11283.78 samples/sec Loss 27.8816 LearningRate 0.0983 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:33,602-Speed 11307.19 samples/sec Loss 27.8805 LearningRate 0.0983 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:34,530-Speed 11049.67 samples/sec Loss 27.8193 LearningRate 0.0982 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-10 23:18:56,638-[lfw][2000]XNorm: 20.481429 Training: 2022-04-10 23:18:56,638-[lfw][2000]Accuracy-Flip: 0.96000+-0.00853 Training: 2022-04-10 23:18:56,641-[lfw][2000]Accuracy-Highest: 0.96000 Training: 2022-04-10 23:19:21,795-[cfp_fp][2000]XNorm: 18.461583 Training: 2022-04-10 23:19:21,795-[cfp_fp][2000]Accuracy-Flip: 0.75300+-0.02095 Training: 2022-04-10 23:19:21,796-[cfp_fp][2000]Accuracy-Highest: 0.75300 Training: 2022-04-10 23:19:43,475-[agedb_30][2000]XNorm: 19.645099 Training: 2022-04-10 23:19:43,476-[agedb_30][2000]Accuracy-Flip: 0.81717+-0.02153 Training: 2022-04-10 23:19:43,476-[agedb_30][2000]Accuracy-Highest: 0.81717 Training: 2022-04-10 23:19:44,426-Speed 146.50 samples/sec Loss 27.5708 LearningRate 0.0982 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:45,332-Speed 11310.93 samples/sec Loss 27.3407 LearningRate 0.0982 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:46,245-Speed 11226.37 samples/sec Loss 27.2468 LearningRate 0.0982 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:47,164-Speed 11148.78 samples/sec Loss 27.3823 LearningRate 0.0982 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:48,120-Speed 10720.67 samples/sec Loss 27.1761 LearningRate 0.0982 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:49,078-Speed 10703.99 samples/sec Loss 27.2561 LearningRate 0.0982 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:50,052-Speed 10516.70 samples/sec Loss 27.3607 LearningRate 0.0982 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:50,979-Speed 11056.07 samples/sec Loss 27.1024 LearningRate 0.0982 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:19:51,921-Speed 10877.63 samples/sec Loss 26.8659 LearningRate 0.0982 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:52,871-Speed 10780.40 samples/sec Loss 26.8969 LearningRate 0.0982 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:53,829-Speed 10704.61 samples/sec Loss 26.6717 LearningRate 0.0982 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:54,767-Speed 10926.47 samples/sec Loss 26.4773 LearningRate 0.0981 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:55,708-Speed 10883.79 samples/sec Loss 26.3842 LearningRate 0.0981 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:56,623-Speed 11201.21 samples/sec Loss 26.3156 LearningRate 0.0981 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:57,533-Speed 11257.37 samples/sec Loss 26.1981 LearningRate 0.0981 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:58,430-Speed 11428.72 samples/sec Loss 26.1893 LearningRate 0.0981 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:19:59,366-Speed 10943.61 samples/sec Loss 26.1705 LearningRate 0.0981 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:00,287-Speed 11129.52 samples/sec Loss 25.9350 LearningRate 0.0981 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:01,209-Speed 11109.26 samples/sec Loss 25.9544 LearningRate 0.0981 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:02,169-Speed 10670.91 samples/sec Loss 25.8819 LearningRate 0.0981 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:03,068-Speed 11401.17 samples/sec Loss 25.9274 LearningRate 0.0981 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:03,990-Speed 11117.63 samples/sec Loss 25.6536 LearningRate 0.0981 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:04,923-Speed 10985.27 samples/sec Loss 25.6081 LearningRate 0.0980 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:05,829-Speed 11300.55 samples/sec Loss 25.6061 LearningRate 0.0980 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:06,753-Speed 11094.83 samples/sec Loss 25.4452 LearningRate 0.0980 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:07,703-Speed 10782.76 samples/sec Loss 25.5250 LearningRate 0.0980 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:08,618-Speed 11204.74 samples/sec Loss 25.3001 LearningRate 0.0980 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:09,558-Speed 10898.88 samples/sec Loss 25.0850 LearningRate 0.0980 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:10,498-Speed 10900.12 samples/sec Loss 25.0645 LearningRate 0.0980 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:11,429-Speed 11005.09 samples/sec Loss 24.9268 LearningRate 0.0980 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:12,376-Speed 10825.15 samples/sec Loss 24.9798 LearningRate 0.0980 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:13,317-Speed 10890.22 samples/sec Loss 24.7456 LearningRate 0.0980 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:14,290-Speed 10529.64 samples/sec Loss 24.8045 LearningRate 0.0980 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:15,217-Speed 11053.78 samples/sec Loss 24.6188 LearningRate 0.0980 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:16,186-Speed 10579.43 samples/sec Loss 24.5758 LearningRate 0.0979 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:17,172-Speed 10390.20 samples/sec Loss 24.5382 LearningRate 0.0979 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:18,170-Speed 10269.24 samples/sec Loss 24.5839 LearningRate 0.0979 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:19,081-Speed 11243.99 samples/sec Loss 24.5336 LearningRate 0.0979 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:19,977-Speed 11435.91 samples/sec Loss 24.4150 LearningRate 0.0979 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:20,899-Speed 11114.71 samples/sec Loss 24.1421 LearningRate 0.0979 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:21,832-Speed 10988.76 samples/sec Loss 24.2337 LearningRate 0.0979 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:22,773-Speed 10889.75 samples/sec Loss 23.9969 LearningRate 0.0979 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:23,646-Speed 11734.17 samples/sec Loss 24.1027 LearningRate 0.0979 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:24,613-Speed 10604.12 samples/sec Loss 24.1901 LearningRate 0.0979 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:25,520-Speed 11287.15 samples/sec Loss 24.0469 LearningRate 0.0979 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:26,479-Speed 10692.05 samples/sec Loss 23.8012 LearningRate 0.0978 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:27,444-Speed 10620.12 samples/sec Loss 23.6179 LearningRate 0.0978 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:28,405-Speed 10657.83 samples/sec Loss 23.4793 LearningRate 0.0978 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:29,361-Speed 10714.95 samples/sec Loss 23.5527 LearningRate 0.0978 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:30,367-Speed 10190.34 samples/sec Loss 23.4519 LearningRate 0.0978 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:31,310-Speed 10867.47 samples/sec Loss 23.5678 LearningRate 0.0978 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:32,254-Speed 10847.84 samples/sec Loss 23.4063 LearningRate 0.0978 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:33,181-Speed 11060.26 samples/sec Loss 23.3174 LearningRate 0.0978 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:34,098-Speed 11169.57 samples/sec Loss 23.1846 LearningRate 0.0978 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:35,031-Speed 10987.17 samples/sec Loss 23.3670 LearningRate 0.0978 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:35,976-Speed 10838.14 samples/sec Loss 23.1317 LearningRate 0.0978 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:20:36,926-Speed 10791.48 samples/sec Loss 23.0679 LearningRate 0.0978 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:37,938-Speed 10126.59 samples/sec Loss 23.0613 LearningRate 0.0977 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:38,914-Speed 10502.34 samples/sec Loss 22.7545 LearningRate 0.0977 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:39,904-Speed 10345.30 samples/sec Loss 22.9585 LearningRate 0.0977 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:40,881-Speed 10498.93 samples/sec Loss 22.9356 LearningRate 0.0977 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:41,837-Speed 10713.78 samples/sec Loss 22.7768 LearningRate 0.0977 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:42,774-Speed 10930.89 samples/sec Loss 22.7247 LearningRate 0.0977 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:43,715-Speed 10892.67 samples/sec Loss 22.6825 LearningRate 0.0977 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:44,684-Speed 10580.17 samples/sec Loss 22.5896 LearningRate 0.0977 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:45,606-Speed 11117.31 samples/sec Loss 22.3644 LearningRate 0.0977 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:46,498-Speed 11477.34 samples/sec Loss 22.4759 LearningRate 0.0977 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:47,407-Speed 11285.27 samples/sec Loss 22.3904 LearningRate 0.0977 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:48,318-Speed 11240.20 samples/sec Loss 22.3759 LearningRate 0.0976 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:49,252-Speed 10978.65 samples/sec Loss 22.5277 LearningRate 0.0976 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:50,190-Speed 10920.35 samples/sec Loss 22.1310 LearningRate 0.0976 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:51,120-Speed 11016.90 samples/sec Loss 22.1801 LearningRate 0.0976 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:52,075-Speed 10731.25 samples/sec Loss 22.0631 LearningRate 0.0976 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:53,031-Speed 10719.20 samples/sec Loss 21.9636 LearningRate 0.0976 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:54,045-Speed 10103.60 samples/sec Loss 21.9038 LearningRate 0.0976 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:54,966-Speed 11128.57 samples/sec Loss 21.8507 LearningRate 0.0976 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:55,895-Speed 11024.03 samples/sec Loss 21.9299 LearningRate 0.0976 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:56,812-Speed 11178.05 samples/sec Loss 21.7875 LearningRate 0.0976 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:57,740-Speed 11040.60 samples/sec Loss 21.9423 LearningRate 0.0976 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:58,675-Speed 10965.30 samples/sec Loss 21.8179 LearningRate 0.0976 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:20:59,635-Speed 10667.30 samples/sec Loss 21.7171 LearningRate 0.0975 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:00,609-Speed 10518.88 samples/sec Loss 21.6514 LearningRate 0.0975 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:01,580-Speed 10557.31 samples/sec Loss 21.6203 LearningRate 0.0975 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:02,517-Speed 10942.26 samples/sec Loss 21.7242 LearningRate 0.0975 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:03,456-Speed 10914.80 samples/sec Loss 21.4616 LearningRate 0.0975 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:04,388-Speed 10996.03 samples/sec Loss 21.3240 LearningRate 0.0975 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:05,347-Speed 10676.32 samples/sec Loss 21.2684 LearningRate 0.0975 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:21:06,290-Speed 10872.80 samples/sec Loss 21.2545 LearningRate 0.0975 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:21:07,227-Speed 10933.32 samples/sec Loss 21.0585 LearningRate 0.0975 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:08,170-Speed 10865.62 samples/sec Loss 21.2101 LearningRate 0.0975 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:09,067-Speed 11427.54 samples/sec Loss 21.3151 LearningRate 0.0975 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:09,984-Speed 11170.58 samples/sec Loss 20.9015 LearningRate 0.0974 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:10,913-Speed 11028.74 samples/sec Loss 21.0200 LearningRate 0.0974 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:11,855-Speed 10880.29 samples/sec Loss 21.0400 LearningRate 0.0974 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:12,772-Speed 11178.75 samples/sec Loss 20.9728 LearningRate 0.0974 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:13,692-Speed 11134.94 samples/sec Loss 20.7652 LearningRate 0.0974 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:14,626-Speed 10970.29 samples/sec Loss 20.7472 LearningRate 0.0974 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:15,545-Speed 11148.87 samples/sec Loss 20.8759 LearningRate 0.0974 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:16,472-Speed 11058.69 samples/sec Loss 20.7956 LearningRate 0.0974 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:17,433-Speed 10667.39 samples/sec Loss 20.7977 LearningRate 0.0974 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:18,427-Speed 10309.68 samples/sec Loss 20.5907 LearningRate 0.0974 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:21:19,352-Speed 11078.39 samples/sec Loss 20.7482 LearningRate 0.0974 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:20,289-Speed 10926.40 samples/sec Loss 20.7045 LearningRate 0.0974 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:21,226-Speed 10936.48 samples/sec Loss 20.7463 LearningRate 0.0973 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:22,165-Speed 10920.41 samples/sec Loss 20.5811 LearningRate 0.0973 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:23,093-Speed 11035.89 samples/sec Loss 20.4691 LearningRate 0.0973 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:24,035-Speed 10872.50 samples/sec Loss 20.2367 LearningRate 0.0973 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:25,020-Speed 10414.62 samples/sec Loss 20.2297 LearningRate 0.0973 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:25,997-Speed 10484.09 samples/sec Loss 20.2981 LearningRate 0.0973 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:26,939-Speed 10874.12 samples/sec Loss 20.2262 LearningRate 0.0973 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:27,879-Speed 10901.65 samples/sec Loss 20.3053 LearningRate 0.0973 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:28,841-Speed 10655.41 samples/sec Loss 20.2407 LearningRate 0.0973 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:29,857-Speed 10089.33 samples/sec Loss 19.9950 LearningRate 0.0973 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:30,832-Speed 10515.00 samples/sec Loss 20.1070 LearningRate 0.0973 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:31,749-Speed 11175.15 samples/sec Loss 19.8892 LearningRate 0.0972 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:32,661-Speed 11237.65 samples/sec Loss 19.9122 LearningRate 0.0972 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:33,577-Speed 11179.26 samples/sec Loss 19.9738 LearningRate 0.0972 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:34,510-Speed 10980.43 samples/sec Loss 20.0226 LearningRate 0.0972 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:35,446-Speed 10953.19 samples/sec Loss 19.8891 LearningRate 0.0972 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:36,348-Speed 11360.90 samples/sec Loss 20.0095 LearningRate 0.0972 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:37,253-Speed 11329.89 samples/sec Loss 19.7639 LearningRate 0.0972 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:38,222-Speed 10566.73 samples/sec Loss 19.7560 LearningRate 0.0972 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:39,141-Speed 11152.54 samples/sec Loss 19.6971 LearningRate 0.0972 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:40,078-Speed 10942.44 samples/sec Loss 19.5947 LearningRate 0.0972 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:41,014-Speed 10937.60 samples/sec Loss 19.8236 LearningRate 0.0972 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:41,944-Speed 11028.42 samples/sec Loss 19.6025 LearningRate 0.0972 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:42,893-Speed 10790.98 samples/sec Loss 19.4682 LearningRate 0.0971 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:43,834-Speed 10892.27 samples/sec Loss 19.5707 LearningRate 0.0971 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:44,764-Speed 11018.45 samples/sec Loss 19.6234 LearningRate 0.0971 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:45,686-Speed 11115.69 samples/sec Loss 19.5347 LearningRate 0.0971 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:46,591-Speed 11318.76 samples/sec Loss 19.4536 LearningRate 0.0971 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:47,529-Speed 10924.31 samples/sec Loss 19.4304 LearningRate 0.0971 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:48,471-Speed 10871.69 samples/sec Loss 19.2527 LearningRate 0.0971 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:49,452-Speed 10449.35 samples/sec Loss 19.3077 LearningRate 0.0971 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:50,429-Speed 10489.75 samples/sec Loss 19.2908 LearningRate 0.0971 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:51,388-Speed 10689.64 samples/sec Loss 19.0994 LearningRate 0.0971 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:21:52,344-Speed 10716.94 samples/sec Loss 19.3428 LearningRate 0.0971 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:53,286-Speed 10881.21 samples/sec Loss 19.1666 LearningRate 0.0970 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:54,260-Speed 10521.16 samples/sec Loss 19.0904 LearningRate 0.0970 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:55,226-Speed 10600.15 samples/sec Loss 18.9020 LearningRate 0.0970 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:56,155-Speed 11037.51 samples/sec Loss 19.0612 LearningRate 0.0970 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:57,065-Speed 11254.98 samples/sec Loss 19.0173 LearningRate 0.0970 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:57,985-Speed 11137.44 samples/sec Loss 18.7901 LearningRate 0.0970 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:58,898-Speed 11223.10 samples/sec Loss 19.0096 LearningRate 0.0970 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:21:59,812-Speed 11215.59 samples/sec Loss 18.9734 LearningRate 0.0970 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:00,774-Speed 10652.22 samples/sec Loss 18.7628 LearningRate 0.0970 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:01,700-Speed 11062.87 samples/sec Loss 19.0209 LearningRate 0.0970 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:02,617-Speed 11180.72 samples/sec Loss 18.8221 LearningRate 0.0970 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:03,578-Speed 10655.53 samples/sec Loss 18.7494 LearningRate 0.0970 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:04,489-Speed 11248.36 samples/sec Loss 18.9363 LearningRate 0.0969 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:05,415-Speed 11070.57 samples/sec Loss 18.9040 LearningRate 0.0969 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:06,361-Speed 10833.74 samples/sec Loss 18.7586 LearningRate 0.0969 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:07,303-Speed 10877.98 samples/sec Loss 18.7388 LearningRate 0.0969 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:08,301-Speed 10263.57 samples/sec Loss 18.6254 LearningRate 0.0969 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:09,238-Speed 10936.81 samples/sec Loss 18.5589 LearningRate 0.0969 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:10,166-Speed 11041.79 samples/sec Loss 18.5938 LearningRate 0.0969 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:11,136-Speed 10565.00 samples/sec Loss 18.6466 LearningRate 0.0969 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:12,061-Speed 11079.27 samples/sec Loss 18.4884 LearningRate 0.0969 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:12,997-Speed 10945.41 samples/sec Loss 18.2262 LearningRate 0.0969 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:13,940-Speed 10866.40 samples/sec Loss 18.2865 LearningRate 0.0969 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:14,907-Speed 10591.72 samples/sec Loss 18.5358 LearningRate 0.0969 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:15,862-Speed 10736.45 samples/sec Loss 18.5794 LearningRate 0.0968 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:16,823-Speed 10662.65 samples/sec Loss 18.4073 LearningRate 0.0968 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:17,797-Speed 10525.32 samples/sec Loss 18.2519 LearningRate 0.0968 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:18,718-Speed 11123.09 samples/sec Loss 18.3900 LearningRate 0.0968 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:19,641-Speed 11111.41 samples/sec Loss 18.2835 LearningRate 0.0968 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:20,547-Speed 11301.38 samples/sec Loss 18.2155 LearningRate 0.0968 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:21,471-Speed 11102.09 samples/sec Loss 18.3848 LearningRate 0.0968 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:22,402-Speed 11004.56 samples/sec Loss 18.1830 LearningRate 0.0968 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:23,351-Speed 10795.78 samples/sec Loss 18.1599 LearningRate 0.0968 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:24,259-Speed 11286.28 samples/sec Loss 18.2118 LearningRate 0.0968 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:25,156-Speed 11419.45 samples/sec Loss 18.2148 LearningRate 0.0968 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:26,056-Speed 11386.46 samples/sec Loss 17.9091 LearningRate 0.0967 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:26,989-Speed 10987.20 samples/sec Loss 18.0779 LearningRate 0.0967 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:27,947-Speed 10693.38 samples/sec Loss 17.9876 LearningRate 0.0967 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:28,870-Speed 11108.74 samples/sec Loss 18.2553 LearningRate 0.0967 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:29,785-Speed 11201.49 samples/sec Loss 18.2621 LearningRate 0.0967 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:30,698-Speed 11218.24 samples/sec Loss 18.0780 LearningRate 0.0967 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:31,631-Speed 10980.02 samples/sec Loss 17.8203 LearningRate 0.0967 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:32,607-Speed 10504.85 samples/sec Loss 17.8236 LearningRate 0.0967 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:33,547-Speed 10898.13 samples/sec Loss 17.7676 LearningRate 0.0967 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:34,455-Speed 11287.82 samples/sec Loss 17.8464 LearningRate 0.0967 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:35,401-Speed 10821.96 samples/sec Loss 17.6126 LearningRate 0.0967 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:36,382-Speed 10452.35 samples/sec Loss 17.8613 LearningRate 0.0967 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:22:37,340-Speed 10692.91 samples/sec Loss 17.8365 LearningRate 0.0966 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:38,286-Speed 10834.12 samples/sec Loss 17.8335 LearningRate 0.0966 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:39,234-Speed 10808.76 samples/sec Loss 17.8029 LearningRate 0.0966 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:40,165-Speed 11012.67 samples/sec Loss 17.8268 LearningRate 0.0966 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:41,115-Speed 10776.85 samples/sec Loss 17.4753 LearningRate 0.0966 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:42,108-Speed 10326.13 samples/sec Loss 17.6058 LearningRate 0.0966 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:42,998-Speed 11508.22 samples/sec Loss 17.6966 LearningRate 0.0966 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:43,902-Speed 11345.42 samples/sec Loss 17.4710 LearningRate 0.0966 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:44,812-Speed 11260.08 samples/sec Loss 17.5581 LearningRate 0.0966 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:45,716-Speed 11329.45 samples/sec Loss 17.5966 LearningRate 0.0966 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:46,647-Speed 11008.51 samples/sec Loss 17.6164 LearningRate 0.0966 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 524288 Required: 7 hours Training: 2022-04-10 23:22:47,595-Speed 10811.21 samples/sec Loss 17.7170 LearningRate 0.0965 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:48,520-Speed 11074.48 samples/sec Loss 17.4221 LearningRate 0.0965 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:49,489-Speed 10574.13 samples/sec Loss 17.4914 LearningRate 0.0965 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:50,431-Speed 10881.76 samples/sec Loss 17.5564 LearningRate 0.0965 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:22:51,370-Speed 10911.07 samples/sec Loss 17.3606 LearningRate 0.0965 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:23:12,931-[lfw][4000]XNorm: 18.325449 Training: 2022-04-10 23:23:12,931-[lfw][4000]Accuracy-Flip: 0.98350+-0.00603 Training: 2022-04-10 23:23:12,932-[lfw][4000]Accuracy-Highest: 0.98350 Training: 2022-04-10 23:23:37,504-[cfp_fp][4000]XNorm: 15.378337 Training: 2022-04-10 23:23:37,505-[cfp_fp][4000]Accuracy-Flip: 0.83957+-0.02225 Training: 2022-04-10 23:23:37,516-[cfp_fp][4000]Accuracy-Highest: 0.83957 Training: 2022-04-10 23:23:58,673-[agedb_30][4000]XNorm: 17.597513 Training: 2022-04-10 23:23:58,673-[agedb_30][4000]Accuracy-Flip: 0.88267+-0.02421 Training: 2022-04-10 23:23:58,673-[agedb_30][4000]Accuracy-Highest: 0.88267 Training: 2022-04-10 23:23:59,604-Speed 150.07 samples/sec Loss 17.4014 LearningRate 0.0965 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:00,563-Speed 10678.99 samples/sec Loss 17.4300 LearningRate 0.0965 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:01,500-Speed 10934.27 samples/sec Loss 17.3540 LearningRate 0.0965 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:02,440-Speed 10902.05 samples/sec Loss 17.3425 LearningRate 0.0965 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:03,350-Speed 11264.99 samples/sec Loss 17.3454 LearningRate 0.0965 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:04,273-Speed 11109.90 samples/sec Loss 17.3037 LearningRate 0.0965 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:05,173-Speed 11380.07 samples/sec Loss 17.2096 LearningRate 0.0965 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:06,092-Speed 11148.32 samples/sec Loss 17.0627 LearningRate 0.0964 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:07,067-Speed 10513.60 samples/sec Loss 17.2719 LearningRate 0.0964 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:08,002-Speed 10953.06 samples/sec Loss 17.2197 LearningRate 0.0964 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:08,891-Speed 11526.25 samples/sec Loss 17.2349 LearningRate 0.0964 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:09,847-Speed 10719.59 samples/sec Loss 17.3399 LearningRate 0.0964 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:10,775-Speed 11042.38 samples/sec Loss 17.1687 LearningRate 0.0964 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:11,705-Speed 11016.98 samples/sec Loss 17.2762 LearningRate 0.0964 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:12,677-Speed 10543.00 samples/sec Loss 17.0445 LearningRate 0.0964 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:13,652-Speed 10518.11 samples/sec Loss 17.1647 LearningRate 0.0964 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:14,632-Speed 10448.50 samples/sec Loss 17.0870 LearningRate 0.0964 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:15,580-Speed 10815.39 samples/sec Loss 16.8945 LearningRate 0.0964 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:16,518-Speed 10917.60 samples/sec Loss 16.9051 LearningRate 0.0963 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:17,490-Speed 10544.70 samples/sec Loss 17.0247 LearningRate 0.0963 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:18,457-Speed 10594.75 samples/sec Loss 17.0316 LearningRate 0.0963 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:19,359-Speed 11357.09 samples/sec Loss 16.5981 LearningRate 0.0963 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:20,293-Speed 10972.04 samples/sec Loss 16.9438 LearningRate 0.0963 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:21,212-Speed 11160.04 samples/sec Loss 17.0296 LearningRate 0.0963 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:22,169-Speed 10703.10 samples/sec Loss 16.8428 LearningRate 0.0963 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:23,117-Speed 10817.37 samples/sec Loss 16.7733 LearningRate 0.0963 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:24,034-Speed 11166.42 samples/sec Loss 16.8093 LearningRate 0.0963 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:24,970-Speed 10950.96 samples/sec Loss 16.9053 LearningRate 0.0963 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:25,926-Speed 10716.44 samples/sec Loss 16.8924 LearningRate 0.0963 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:26,860-Speed 10976.35 samples/sec Loss 16.9847 LearningRate 0.0963 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:27,843-Speed 10419.36 samples/sec Loss 16.6503 LearningRate 0.0962 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:28,775-Speed 10998.06 samples/sec Loss 16.8025 LearningRate 0.0962 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:29,748-Speed 10532.16 samples/sec Loss 16.6242 LearningRate 0.0962 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:30,697-Speed 10793.51 samples/sec Loss 16.6819 LearningRate 0.0962 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:31,613-Speed 11182.65 samples/sec Loss 16.8278 LearningRate 0.0962 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:32,534-Speed 11131.72 samples/sec Loss 16.6164 LearningRate 0.0962 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:33,449-Speed 11199.94 samples/sec Loss 16.6367 LearningRate 0.0962 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:34,354-Speed 11317.32 samples/sec Loss 16.5802 LearningRate 0.0962 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:35,313-Speed 10684.43 samples/sec Loss 16.8239 LearningRate 0.0962 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:36,278-Speed 10633.71 samples/sec Loss 16.4945 LearningRate 0.0962 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:37,234-Speed 10718.98 samples/sec Loss 16.4433 LearningRate 0.0962 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:38,206-Speed 10546.44 samples/sec Loss 16.5137 LearningRate 0.0962 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:39,172-Speed 10606.87 samples/sec Loss 16.5912 LearningRate 0.0961 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:40,117-Speed 10839.52 samples/sec Loss 16.4570 LearningRate 0.0961 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:41,077-Speed 10681.33 samples/sec Loss 16.5199 LearningRate 0.0961 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:42,044-Speed 10592.43 samples/sec Loss 16.3073 LearningRate 0.0961 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:42,960-Speed 11196.21 samples/sec Loss 16.3790 LearningRate 0.0961 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:43,899-Speed 10903.94 samples/sec Loss 16.3994 LearningRate 0.0961 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:44,823-Speed 11097.37 samples/sec Loss 16.3233 LearningRate 0.0961 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:45,761-Speed 10925.19 samples/sec Loss 16.5249 LearningRate 0.0961 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:46,665-Speed 11337.48 samples/sec Loss 16.4554 LearningRate 0.0961 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:47,569-Speed 11324.11 samples/sec Loss 16.3873 LearningRate 0.0961 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:48,496-Speed 11064.93 samples/sec Loss 16.3395 LearningRate 0.0961 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:49,439-Speed 10862.64 samples/sec Loss 16.2474 LearningRate 0.0960 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:50,410-Speed 10554.13 samples/sec Loss 16.3637 LearningRate 0.0960 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:51,331-Speed 11118.76 samples/sec Loss 16.2620 LearningRate 0.0960 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:52,225-Speed 11472.09 samples/sec Loss 16.1927 LearningRate 0.0960 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:53,135-Speed 11257.29 samples/sec Loss 16.3993 LearningRate 0.0960 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:54,114-Speed 10465.15 samples/sec Loss 16.2415 LearningRate 0.0960 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:55,070-Speed 10714.86 samples/sec Loss 16.1127 LearningRate 0.0960 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:24:55,992-Speed 11119.39 samples/sec Loss 16.4363 LearningRate 0.0960 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:56,900-Speed 11291.03 samples/sec Loss 16.1258 LearningRate 0.0960 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:57,811-Speed 11249.23 samples/sec Loss 16.0925 LearningRate 0.0960 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:58,768-Speed 10706.70 samples/sec Loss 16.0804 LearningRate 0.0960 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:24:59,715-Speed 10814.90 samples/sec Loss 16.2075 LearningRate 0.0960 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:00,665-Speed 10791.35 samples/sec Loss 16.2650 LearningRate 0.0959 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:01,618-Speed 10746.35 samples/sec Loss 16.2021 LearningRate 0.0959 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:02,576-Speed 10699.35 samples/sec Loss 16.0649 LearningRate 0.0959 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:03,549-Speed 10531.56 samples/sec Loss 16.0376 LearningRate 0.0959 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:04,514-Speed 10624.37 samples/sec Loss 16.1128 LearningRate 0.0959 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:05,485-Speed 10558.39 samples/sec Loss 16.0217 LearningRate 0.0959 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:06,414-Speed 11028.09 samples/sec Loss 16.0987 LearningRate 0.0959 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:07,359-Speed 10840.43 samples/sec Loss 16.0492 LearningRate 0.0959 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:08,265-Speed 11315.02 samples/sec Loss 15.9830 LearningRate 0.0959 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:09,194-Speed 11031.41 samples/sec Loss 15.9359 LearningRate 0.0959 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:10,113-Speed 11150.43 samples/sec Loss 15.9896 LearningRate 0.0959 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:11,025-Speed 11234.64 samples/sec Loss 15.8852 LearningRate 0.0958 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:11,933-Speed 11283.96 samples/sec Loss 16.2362 LearningRate 0.0958 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:12,839-Speed 11313.20 samples/sec Loss 15.9017 LearningRate 0.0958 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:13,765-Speed 11064.77 samples/sec Loss 15.9794 LearningRate 0.0958 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:14,711-Speed 10832.79 samples/sec Loss 15.6994 LearningRate 0.0958 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:15,605-Speed 11454.51 samples/sec Loss 15.9444 LearningRate 0.0958 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:16,518-Speed 11226.21 samples/sec Loss 15.8634 LearningRate 0.0958 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:17,475-Speed 10706.07 samples/sec Loss 15.5924 LearningRate 0.0958 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:18,433-Speed 10697.46 samples/sec Loss 15.8476 LearningRate 0.0958 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:19,334-Speed 11374.46 samples/sec Loss 15.8963 LearningRate 0.0958 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:20,241-Speed 11322.24 samples/sec Loss 15.8096 LearningRate 0.0958 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:21,167-Speed 11064.05 samples/sec Loss 15.7491 LearningRate 0.0958 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:22,113-Speed 10834.38 samples/sec Loss 15.8859 LearningRate 0.0957 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:23,016-Speed 11348.92 samples/sec Loss 15.7668 LearningRate 0.0957 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:23,967-Speed 10769.72 samples/sec Loss 15.9604 LearningRate 0.0957 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:24,902-Speed 10955.45 samples/sec Loss 15.8481 LearningRate 0.0957 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:25,860-Speed 10702.12 samples/sec Loss 15.5667 LearningRate 0.0957 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:26,795-Speed 10963.22 samples/sec Loss 15.5508 LearningRate 0.0957 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:27,727-Speed 10996.43 samples/sec Loss 15.7664 LearningRate 0.0957 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:28,693-Speed 10610.72 samples/sec Loss 15.7286 LearningRate 0.0957 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:29,623-Speed 11008.58 samples/sec Loss 15.7236 LearningRate 0.0957 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:30,573-Speed 10785.73 samples/sec Loss 15.6702 LearningRate 0.0957 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:31,523-Speed 10788.98 samples/sec Loss 15.6433 LearningRate 0.0957 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:32,460-Speed 10938.51 samples/sec Loss 15.5013 LearningRate 0.0957 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:33,356-Speed 11443.65 samples/sec Loss 15.5931 LearningRate 0.0956 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:34,279-Speed 11093.15 samples/sec Loss 15.6179 LearningRate 0.0956 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:35,209-Speed 11017.84 samples/sec Loss 15.5987 LearningRate 0.0956 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:36,139-Speed 11023.52 samples/sec Loss 15.5625 LearningRate 0.0956 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:37,035-Speed 11438.57 samples/sec Loss 15.7519 LearningRate 0.0956 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:37,947-Speed 11226.90 samples/sec Loss 15.5371 LearningRate 0.0956 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:38,889-Speed 10878.95 samples/sec Loss 15.3814 LearningRate 0.0956 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:39,820-Speed 11012.67 samples/sec Loss 15.5079 LearningRate 0.0956 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:40,774-Speed 10740.74 samples/sec Loss 15.3430 LearningRate 0.0956 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:41,673-Speed 11397.06 samples/sec Loss 15.3492 LearningRate 0.0956 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:42,653-Speed 10457.30 samples/sec Loss 15.4380 LearningRate 0.0956 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:43,590-Speed 10927.22 samples/sec Loss 15.5356 LearningRate 0.0955 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:44,534-Speed 10853.63 samples/sec Loss 15.3799 LearningRate 0.0955 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:45,485-Speed 10784.80 samples/sec Loss 15.4034 LearningRate 0.0955 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:46,419-Speed 10966.39 samples/sec Loss 15.5096 LearningRate 0.0955 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:47,383-Speed 10631.94 samples/sec Loss 15.5191 LearningRate 0.0955 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:48,331-Speed 10801.64 samples/sec Loss 15.4547 LearningRate 0.0955 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:49,328-Speed 10288.46 samples/sec Loss 15.3081 LearningRate 0.0955 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:50,273-Speed 10840.00 samples/sec Loss 15.4242 LearningRate 0.0955 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:51,224-Speed 10781.98 samples/sec Loss 15.4128 LearningRate 0.0955 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:52,145-Speed 11124.81 samples/sec Loss 15.5183 LearningRate 0.0955 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:53,077-Speed 10992.95 samples/sec Loss 15.3715 LearningRate 0.0955 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:54,018-Speed 10887.15 samples/sec Loss 15.2955 LearningRate 0.0955 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:54,964-Speed 10832.27 samples/sec Loss 15.3737 LearningRate 0.0954 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:55,902-Speed 10926.61 samples/sec Loss 15.1517 LearningRate 0.0954 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:56,815-Speed 11230.78 samples/sec Loss 15.1300 LearningRate 0.0954 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:57,723-Speed 11279.99 samples/sec Loss 15.4228 LearningRate 0.0954 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:25:58,660-Speed 10939.91 samples/sec Loss 15.2148 LearningRate 0.0954 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:25:59,583-Speed 11096.68 samples/sec Loss 15.4983 LearningRate 0.0954 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:26:00,527-Speed 10851.42 samples/sec Loss 15.3264 LearningRate 0.0954 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:26:01,460-Speed 10982.36 samples/sec Loss 15.3723 LearningRate 0.0954 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:02,420-Speed 10675.05 samples/sec Loss 15.5000 LearningRate 0.0954 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:03,350-Speed 11019.91 samples/sec Loss 15.3331 LearningRate 0.0954 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:04,273-Speed 11102.05 samples/sec Loss 15.3868 LearningRate 0.0954 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:05,214-Speed 10886.79 samples/sec Loss 15.2519 LearningRate 0.0954 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:06,134-Speed 11145.20 samples/sec Loss 15.2159 LearningRate 0.0953 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:07,056-Speed 11115.81 samples/sec Loss 15.2642 LearningRate 0.0953 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:07,966-Speed 11251.23 samples/sec Loss 15.1787 LearningRate 0.0953 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:08,869-Speed 11355.02 samples/sec Loss 15.0730 LearningRate 0.0953 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:09,780-Speed 11243.93 samples/sec Loss 15.2166 LearningRate 0.0953 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:10,729-Speed 10801.30 samples/sec Loss 15.1032 LearningRate 0.0953 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:11,677-Speed 10803.39 samples/sec Loss 15.2047 LearningRate 0.0953 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:12,635-Speed 10696.32 samples/sec Loss 15.0916 LearningRate 0.0953 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:13,561-Speed 11066.50 samples/sec Loss 15.1838 LearningRate 0.0953 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:14,556-Speed 10297.72 samples/sec Loss 15.1612 LearningRate 0.0953 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:15,514-Speed 10701.21 samples/sec Loss 15.0779 LearningRate 0.0953 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:16,489-Speed 10509.46 samples/sec Loss 14.9895 LearningRate 0.0952 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:17,422-Speed 10989.94 samples/sec Loss 14.9779 LearningRate 0.0952 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:18,386-Speed 10624.36 samples/sec Loss 15.0789 LearningRate 0.0952 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:19,323-Speed 10939.28 samples/sec Loss 15.1045 LearningRate 0.0952 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:20,240-Speed 11175.37 samples/sec Loss 15.0724 LearningRate 0.0952 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:21,131-Speed 11495.25 samples/sec Loss 14.9958 LearningRate 0.0952 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:22,056-Speed 11084.39 samples/sec Loss 15.1293 LearningRate 0.0952 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:23,008-Speed 10759.49 samples/sec Loss 15.0519 LearningRate 0.0952 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:23,955-Speed 10822.81 samples/sec Loss 15.0284 LearningRate 0.0952 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:24,892-Speed 10942.30 samples/sec Loss 14.9286 LearningRate 0.0952 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:25,819-Speed 11050.76 samples/sec Loss 15.0016 LearningRate 0.0952 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:26,754-Speed 10956.23 samples/sec Loss 14.9026 LearningRate 0.0952 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:27,696-Speed 10885.38 samples/sec Loss 14.8514 LearningRate 0.0951 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:28,618-Speed 11105.24 samples/sec Loss 14.9780 LearningRate 0.0951 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:29,582-Speed 10632.92 samples/sec Loss 14.9424 LearningRate 0.0951 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:30,526-Speed 10853.57 samples/sec Loss 14.8572 LearningRate 0.0951 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:31,441-Speed 11193.94 samples/sec Loss 14.7065 LearningRate 0.0951 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:32,392-Speed 10778.03 samples/sec Loss 14.7132 LearningRate 0.0951 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:33,324-Speed 10992.28 samples/sec Loss 14.9013 LearningRate 0.0951 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:34,208-Speed 11600.49 samples/sec Loss 14.8041 LearningRate 0.0951 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:35,145-Speed 10929.61 samples/sec Loss 14.9409 LearningRate 0.0951 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:36,110-Speed 10621.02 samples/sec Loss 14.9415 LearningRate 0.0951 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:47,291-Speed 915.97 samples/sec Loss 14.3584 LearningRate 0.0951 Epoch: 1 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:48,264-Speed 10536.91 samples/sec Loss 13.9986 LearningRate 0.0951 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:49,238-Speed 10520.15 samples/sec Loss 13.8747 LearningRate 0.0950 Epoch: 1 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:50,243-Speed 10194.68 samples/sec Loss 13.8591 LearningRate 0.0950 Epoch: 1 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:51,212-Speed 10586.49 samples/sec Loss 13.7993 LearningRate 0.0950 Epoch: 1 Global Step: 5730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:52,283-Speed 9568.17 samples/sec Loss 14.1020 LearningRate 0.0950 Epoch: 1 Global Step: 5740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:53,232-Speed 10795.07 samples/sec Loss 13.9671 LearningRate 0.0950 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:26:54,200-Speed 10581.41 samples/sec Loss 13.8933 LearningRate 0.0950 Epoch: 1 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:55,114-Speed 11222.52 samples/sec Loss 13.9345 LearningRate 0.0950 Epoch: 1 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:56,086-Speed 10553.05 samples/sec Loss 14.0192 LearningRate 0.0950 Epoch: 1 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:57,019-Speed 10973.58 samples/sec Loss 13.9225 LearningRate 0.0950 Epoch: 1 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:57,936-Speed 11173.35 samples/sec Loss 14.0261 LearningRate 0.0950 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:58,892-Speed 10728.03 samples/sec Loss 13.9363 LearningRate 0.0950 Epoch: 1 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:26:59,849-Speed 10706.41 samples/sec Loss 13.8714 LearningRate 0.0949 Epoch: 1 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:00,778-Speed 11028.04 samples/sec Loss 13.9636 LearningRate 0.0949 Epoch: 1 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:01,730-Speed 10763.94 samples/sec Loss 14.0265 LearningRate 0.0949 Epoch: 1 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:02,663-Speed 10979.60 samples/sec Loss 13.9759 LearningRate 0.0949 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:03,654-Speed 10342.59 samples/sec Loss 13.8447 LearningRate 0.0949 Epoch: 1 Global Step: 5860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:27:04,606-Speed 10768.99 samples/sec Loss 14.0578 LearningRate 0.0949 Epoch: 1 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:05,572-Speed 10611.63 samples/sec Loss 13.9037 LearningRate 0.0949 Epoch: 1 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:06,535-Speed 10637.72 samples/sec Loss 14.0183 LearningRate 0.0949 Epoch: 1 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:07,460-Speed 11080.92 samples/sec Loss 14.0747 LearningRate 0.0949 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:08,348-Speed 11538.70 samples/sec Loss 13.8933 LearningRate 0.0949 Epoch: 1 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:09,262-Speed 11217.43 samples/sec Loss 13.8801 LearningRate 0.0949 Epoch: 1 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:10,162-Speed 11384.77 samples/sec Loss 14.0747 LearningRate 0.0949 Epoch: 1 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:11,088-Speed 11056.03 samples/sec Loss 13.9941 LearningRate 0.0948 Epoch: 1 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:12,008-Speed 11138.32 samples/sec Loss 14.1501 LearningRate 0.0948 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:12,915-Speed 11305.32 samples/sec Loss 14.1711 LearningRate 0.0948 Epoch: 1 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:27:13,870-Speed 10728.30 samples/sec Loss 13.9965 LearningRate 0.0948 Epoch: 1 Global Step: 5970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:27:14,805-Speed 10954.05 samples/sec Loss 14.0170 LearningRate 0.0948 Epoch: 1 Global Step: 5980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:27:15,769-Speed 10635.99 samples/sec Loss 13.9524 LearningRate 0.0948 Epoch: 1 Global Step: 5990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:27:16,718-Speed 10797.84 samples/sec Loss 14.0709 LearningRate 0.0948 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:27:38,317-[lfw][6000]XNorm: 17.105306 Training: 2022-04-10 23:27:38,317-[lfw][6000]Accuracy-Flip: 0.98867+-0.00476 Training: 2022-04-10 23:27:38,318-[lfw][6000]Accuracy-Highest: 0.98867 Training: 2022-04-10 23:28:03,204-[cfp_fp][6000]XNorm: 14.320000 Training: 2022-04-10 23:28:03,205-[cfp_fp][6000]Accuracy-Flip: 0.86057+-0.01888 Training: 2022-04-10 23:28:03,205-[cfp_fp][6000]Accuracy-Highest: 0.86057 Training: 2022-04-10 23:28:24,453-[agedb_30][6000]XNorm: 16.558923 Training: 2022-04-10 23:28:24,453-[agedb_30][6000]Accuracy-Flip: 0.91083+-0.02022 Training: 2022-04-10 23:28:24,454-[agedb_30][6000]Accuracy-Highest: 0.91083 Training: 2022-04-10 23:28:25,481-Speed 148.92 samples/sec Loss 13.7400 LearningRate 0.0948 Epoch: 1 Global Step: 6010 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:26,478-Speed 10277.73 samples/sec Loss 14.0694 LearningRate 0.0948 Epoch: 1 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:27,402-Speed 11088.18 samples/sec Loss 14.0370 LearningRate 0.0948 Epoch: 1 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:28,354-Speed 10771.36 samples/sec Loss 14.2657 LearningRate 0.0948 Epoch: 1 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:29,362-Speed 10159.61 samples/sec Loss 13.9505 LearningRate 0.0948 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:30,291-Speed 11034.49 samples/sec Loss 13.8445 LearningRate 0.0947 Epoch: 1 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:31,196-Speed 11315.26 samples/sec Loss 13.9397 LearningRate 0.0947 Epoch: 1 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:32,179-Speed 10432.04 samples/sec Loss 13.9740 LearningRate 0.0947 Epoch: 1 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:33,093-Speed 11207.44 samples/sec Loss 13.9598 LearningRate 0.0947 Epoch: 1 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:33,987-Speed 11456.63 samples/sec Loss 13.9049 LearningRate 0.0947 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:34,882-Speed 11451.43 samples/sec Loss 13.8793 LearningRate 0.0947 Epoch: 1 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:35,818-Speed 10950.42 samples/sec Loss 14.0157 LearningRate 0.0947 Epoch: 1 Global Step: 6120 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:36,742-Speed 11092.96 samples/sec Loss 14.1571 LearningRate 0.0947 Epoch: 1 Global Step: 6130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:37,669-Speed 11044.93 samples/sec Loss 14.0617 LearningRate 0.0947 Epoch: 1 Global Step: 6140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:38,624-Speed 10727.99 samples/sec Loss 14.0239 LearningRate 0.0947 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:39,584-Speed 10675.63 samples/sec Loss 13.9387 LearningRate 0.0947 Epoch: 1 Global Step: 6160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:40,561-Speed 10491.83 samples/sec Loss 14.0752 LearningRate 0.0946 Epoch: 1 Global Step: 6170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:41,516-Speed 10733.76 samples/sec Loss 14.0511 LearningRate 0.0946 Epoch: 1 Global Step: 6180 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:42,468-Speed 10764.73 samples/sec Loss 13.8456 LearningRate 0.0946 Epoch: 1 Global Step: 6190 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:43,429-Speed 10655.04 samples/sec Loss 13.9590 LearningRate 0.0946 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:28:44,368-Speed 10918.80 samples/sec Loss 13.8273 LearningRate 0.0946 Epoch: 1 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:45,280-Speed 11233.45 samples/sec Loss 13.8956 LearningRate 0.0946 Epoch: 1 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:46,227-Speed 10827.68 samples/sec Loss 14.0398 LearningRate 0.0946 Epoch: 1 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:47,123-Speed 11433.47 samples/sec Loss 13.7478 LearningRate 0.0946 Epoch: 1 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:48,064-Speed 10894.74 samples/sec Loss 13.9695 LearningRate 0.0946 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:48,986-Speed 11106.91 samples/sec Loss 13.8157 LearningRate 0.0946 Epoch: 1 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:49,895-Speed 11273.83 samples/sec Loss 14.0526 LearningRate 0.0946 Epoch: 1 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:50,806-Speed 11256.11 samples/sec Loss 13.9465 LearningRate 0.0946 Epoch: 1 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:51,722-Speed 11178.24 samples/sec Loss 13.7889 LearningRate 0.0945 Epoch: 1 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:52,633-Speed 11253.52 samples/sec Loss 13.9188 LearningRate 0.0945 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:53,556-Speed 11104.38 samples/sec Loss 13.8838 LearningRate 0.0945 Epoch: 1 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:28:54,493-Speed 10929.64 samples/sec Loss 13.8578 LearningRate 0.0945 Epoch: 1 Global Step: 6320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:28:55,431-Speed 10932.17 samples/sec Loss 13.8190 LearningRate 0.0945 Epoch: 1 Global Step: 6330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:28:56,353-Speed 11106.77 samples/sec Loss 13.8855 LearningRate 0.0945 Epoch: 1 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:28:57,253-Speed 11395.06 samples/sec Loss 13.8378 LearningRate 0.0945 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:28:58,179-Speed 11057.15 samples/sec Loss 14.0218 LearningRate 0.0945 Epoch: 1 Global Step: 6360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:28:59,104-Speed 11078.45 samples/sec Loss 13.8449 LearningRate 0.0945 Epoch: 1 Global Step: 6370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:00,015-Speed 11256.34 samples/sec Loss 13.9060 LearningRate 0.0945 Epoch: 1 Global Step: 6380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:00,950-Speed 10949.13 samples/sec Loss 13.8958 LearningRate 0.0945 Epoch: 1 Global Step: 6390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:01,893-Speed 10869.51 samples/sec Loss 13.8693 LearningRate 0.0945 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:02,851-Speed 10698.82 samples/sec Loss 13.9451 LearningRate 0.0944 Epoch: 1 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:03,778-Speed 11058.21 samples/sec Loss 13.7383 LearningRate 0.0944 Epoch: 1 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:04,729-Speed 10778.26 samples/sec Loss 13.7716 LearningRate 0.0944 Epoch: 1 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:05,656-Speed 11051.72 samples/sec Loss 13.7972 LearningRate 0.0944 Epoch: 1 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:06,568-Speed 11240.15 samples/sec Loss 14.0717 LearningRate 0.0944 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:07,500-Speed 10985.34 samples/sec Loss 13.9589 LearningRate 0.0944 Epoch: 1 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:08,433-Speed 10988.87 samples/sec Loss 13.7999 LearningRate 0.0944 Epoch: 1 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:09,325-Speed 11480.85 samples/sec Loss 13.8030 LearningRate 0.0944 Epoch: 1 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:10,253-Speed 11050.08 samples/sec Loss 13.7645 LearningRate 0.0944 Epoch: 1 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:11,197-Speed 10853.20 samples/sec Loss 13.9522 LearningRate 0.0944 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:12,102-Speed 11326.41 samples/sec Loss 13.8246 LearningRate 0.0944 Epoch: 1 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:13,026-Speed 11082.48 samples/sec Loss 13.8052 LearningRate 0.0943 Epoch: 1 Global Step: 6520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:13,917-Speed 11503.36 samples/sec Loss 13.7248 LearningRate 0.0943 Epoch: 1 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:14,858-Speed 10888.31 samples/sec Loss 13.9147 LearningRate 0.0943 Epoch: 1 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:15,780-Speed 11117.40 samples/sec Loss 13.7308 LearningRate 0.0943 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:16,680-Speed 11377.01 samples/sec Loss 13.8150 LearningRate 0.0943 Epoch: 1 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:17,579-Speed 11408.17 samples/sec Loss 13.8389 LearningRate 0.0943 Epoch: 1 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:18,524-Speed 10838.09 samples/sec Loss 13.7662 LearningRate 0.0943 Epoch: 1 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:19,431-Speed 11300.05 samples/sec Loss 13.9447 LearningRate 0.0943 Epoch: 1 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:20,380-Speed 10801.68 samples/sec Loss 13.9798 LearningRate 0.0943 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:21,276-Speed 11432.87 samples/sec Loss 13.8709 LearningRate 0.0943 Epoch: 1 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:22,174-Speed 11413.48 samples/sec Loss 13.7509 LearningRate 0.0943 Epoch: 1 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:23,087-Speed 11216.77 samples/sec Loss 13.9221 LearningRate 0.0943 Epoch: 1 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:24,026-Speed 10914.11 samples/sec Loss 13.6083 LearningRate 0.0942 Epoch: 1 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:24,945-Speed 11146.32 samples/sec Loss 13.7618 LearningRate 0.0942 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:25,888-Speed 10872.67 samples/sec Loss 13.7847 LearningRate 0.0942 Epoch: 1 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:26,866-Speed 10477.71 samples/sec Loss 13.5788 LearningRate 0.0942 Epoch: 1 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:27,847-Speed 10450.30 samples/sec Loss 13.7918 LearningRate 0.0942 Epoch: 1 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:28,792-Speed 10841.25 samples/sec Loss 13.7919 LearningRate 0.0942 Epoch: 1 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:29,680-Speed 11548.82 samples/sec Loss 13.9134 LearningRate 0.0942 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:30,626-Speed 10827.94 samples/sec Loss 13.6243 LearningRate 0.0942 Epoch: 1 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:31,542-Speed 11197.43 samples/sec Loss 13.4569 LearningRate 0.0942 Epoch: 1 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:32,433-Speed 11500.10 samples/sec Loss 13.7398 LearningRate 0.0942 Epoch: 1 Global Step: 6730 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:33,374-Speed 10880.32 samples/sec Loss 13.6826 LearningRate 0.0942 Epoch: 1 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:34,309-Speed 10967.18 samples/sec Loss 13.6192 LearningRate 0.0942 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:35,211-Speed 11354.60 samples/sec Loss 13.8191 LearningRate 0.0941 Epoch: 1 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:36,112-Speed 11373.54 samples/sec Loss 13.6839 LearningRate 0.0941 Epoch: 1 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:37,057-Speed 10845.03 samples/sec Loss 13.5895 LearningRate 0.0941 Epoch: 1 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:37,956-Speed 11397.57 samples/sec Loss 13.5569 LearningRate 0.0941 Epoch: 1 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:38,872-Speed 11178.75 samples/sec Loss 13.7262 LearningRate 0.0941 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:39,794-Speed 11112.94 samples/sec Loss 13.7042 LearningRate 0.0941 Epoch: 1 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:40,733-Speed 10913.04 samples/sec Loss 13.6253 LearningRate 0.0941 Epoch: 1 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:41,654-Speed 11132.06 samples/sec Loss 13.6135 LearningRate 0.0941 Epoch: 1 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:42,574-Speed 11137.63 samples/sec Loss 13.6278 LearningRate 0.0941 Epoch: 1 Global Step: 6840 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:43,484-Speed 11261.43 samples/sec Loss 13.8059 LearningRate 0.0941 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:44,420-Speed 10948.92 samples/sec Loss 13.6822 LearningRate 0.0941 Epoch: 1 Global Step: 6860 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:45,330-Speed 11258.52 samples/sec Loss 13.7506 LearningRate 0.0941 Epoch: 1 Global Step: 6870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:46,227-Speed 11431.48 samples/sec Loss 13.8721 LearningRate 0.0940 Epoch: 1 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:47,152-Speed 11069.97 samples/sec Loss 13.6086 LearningRate 0.0940 Epoch: 1 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:48,104-Speed 10765.16 samples/sec Loss 13.6665 LearningRate 0.0940 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:49,034-Speed 11027.87 samples/sec Loss 13.5797 LearningRate 0.0940 Epoch: 1 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:49,953-Speed 11150.90 samples/sec Loss 13.6182 LearningRate 0.0940 Epoch: 1 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:50,910-Speed 10708.28 samples/sec Loss 13.6407 LearningRate 0.0940 Epoch: 1 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:51,847-Speed 10936.20 samples/sec Loss 13.5260 LearningRate 0.0940 Epoch: 1 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:52,817-Speed 10553.06 samples/sec Loss 13.6793 LearningRate 0.0940 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:53,743-Speed 11075.13 samples/sec Loss 13.6507 LearningRate 0.0940 Epoch: 1 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:54,672-Speed 11029.57 samples/sec Loss 13.6357 LearningRate 0.0940 Epoch: 1 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:29:55,554-Speed 11618.83 samples/sec Loss 13.5873 LearningRate 0.0940 Epoch: 1 Global Step: 6980 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:56,476-Speed 11103.89 samples/sec Loss 13.5825 LearningRate 0.0939 Epoch: 1 Global Step: 6990 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:29:57,362-Speed 11574.97 samples/sec Loss 13.4751 LearningRate 0.0939 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:58,248-Speed 11557.56 samples/sec Loss 13.9507 LearningRate 0.0939 Epoch: 1 Global Step: 7010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:29:59,159-Speed 11254.74 samples/sec Loss 13.6782 LearningRate 0.0939 Epoch: 1 Global Step: 7020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:00,088-Speed 11029.31 samples/sec Loss 13.4383 LearningRate 0.0939 Epoch: 1 Global Step: 7030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:01,036-Speed 10805.66 samples/sec Loss 13.7012 LearningRate 0.0939 Epoch: 1 Global Step: 7040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:01,943-Speed 11301.64 samples/sec Loss 13.4194 LearningRate 0.0939 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:02,870-Speed 11063.15 samples/sec Loss 13.4873 LearningRate 0.0939 Epoch: 1 Global Step: 7060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:03,803-Speed 10972.91 samples/sec Loss 13.6778 LearningRate 0.0939 Epoch: 1 Global Step: 7070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:04,698-Speed 11455.37 samples/sec Loss 13.6636 LearningRate 0.0939 Epoch: 1 Global Step: 7080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:05,600-Speed 11361.99 samples/sec Loss 13.6254 LearningRate 0.0939 Epoch: 1 Global Step: 7090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:06,513-Speed 11229.45 samples/sec Loss 13.3979 LearningRate 0.0939 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:07,436-Speed 11099.75 samples/sec Loss 13.5181 LearningRate 0.0938 Epoch: 1 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:08,348-Speed 11240.75 samples/sec Loss 13.5722 LearningRate 0.0938 Epoch: 1 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:09,267-Speed 11145.21 samples/sec Loss 13.4053 LearningRate 0.0938 Epoch: 1 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:10,156-Speed 11529.97 samples/sec Loss 13.6310 LearningRate 0.0938 Epoch: 1 Global Step: 7140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:11,066-Speed 11266.77 samples/sec Loss 13.5705 LearningRate 0.0938 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:12,025-Speed 10676.67 samples/sec Loss 13.4422 LearningRate 0.0938 Epoch: 1 Global Step: 7160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:12,935-Speed 11259.14 samples/sec Loss 13.7448 LearningRate 0.0938 Epoch: 1 Global Step: 7170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:13,879-Speed 10862.82 samples/sec Loss 13.5483 LearningRate 0.0938 Epoch: 1 Global Step: 7180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:14,823-Speed 10854.43 samples/sec Loss 13.5772 LearningRate 0.0938 Epoch: 1 Global Step: 7190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:15,737-Speed 11203.40 samples/sec Loss 13.6816 LearningRate 0.0938 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:16,666-Speed 11036.28 samples/sec Loss 13.4905 LearningRate 0.0938 Epoch: 1 Global Step: 7210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:17,578-Speed 11229.47 samples/sec Loss 13.3587 LearningRate 0.0938 Epoch: 1 Global Step: 7220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:18,477-Speed 11406.66 samples/sec Loss 13.5498 LearningRate 0.0937 Epoch: 1 Global Step: 7230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:19,394-Speed 11171.19 samples/sec Loss 13.3600 LearningRate 0.0937 Epoch: 1 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:20,307-Speed 11220.93 samples/sec Loss 13.3667 LearningRate 0.0937 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:21,197-Speed 11512.86 samples/sec Loss 13.3809 LearningRate 0.0937 Epoch: 1 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:22,094-Speed 11425.28 samples/sec Loss 13.4487 LearningRate 0.0937 Epoch: 1 Global Step: 7270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:23,004-Speed 11267.15 samples/sec Loss 13.4670 LearningRate 0.0937 Epoch: 1 Global Step: 7280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:23,890-Speed 11559.18 samples/sec Loss 13.2944 LearningRate 0.0937 Epoch: 1 Global Step: 7290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:24,837-Speed 10826.48 samples/sec Loss 13.3206 LearningRate 0.0937 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:25,729-Speed 11487.43 samples/sec Loss 13.3126 LearningRate 0.0937 Epoch: 1 Global Step: 7310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:26,648-Speed 11153.50 samples/sec Loss 13.3295 LearningRate 0.0937 Epoch: 1 Global Step: 7320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:27,573-Speed 11077.91 samples/sec Loss 13.3701 LearningRate 0.0937 Epoch: 1 Global Step: 7330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:28,503-Speed 11016.89 samples/sec Loss 13.1949 LearningRate 0.0936 Epoch: 1 Global Step: 7340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:29,425-Speed 11109.21 samples/sec Loss 13.4270 LearningRate 0.0936 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:30,343-Speed 11168.67 samples/sec Loss 13.2751 LearningRate 0.0936 Epoch: 1 Global Step: 7360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:30:31,241-Speed 11419.32 samples/sec Loss 13.3564 LearningRate 0.0936 Epoch: 1 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:32,176-Speed 10955.95 samples/sec Loss 13.3275 LearningRate 0.0936 Epoch: 1 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:33,078-Speed 11363.02 samples/sec Loss 13.5620 LearningRate 0.0936 Epoch: 1 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:33,971-Speed 11481.82 samples/sec Loss 13.2634 LearningRate 0.0936 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:34,876-Speed 11321.26 samples/sec Loss 13.1956 LearningRate 0.0936 Epoch: 1 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:35,809-Speed 10984.05 samples/sec Loss 13.4093 LearningRate 0.0936 Epoch: 1 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:36,717-Speed 11282.39 samples/sec Loss 13.3938 LearningRate 0.0936 Epoch: 1 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:37,617-Speed 11380.35 samples/sec Loss 13.2959 LearningRate 0.0936 Epoch: 1 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:38,547-Speed 11031.52 samples/sec Loss 13.3383 LearningRate 0.0936 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:39,506-Speed 10679.87 samples/sec Loss 13.2289 LearningRate 0.0935 Epoch: 1 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:40,413-Speed 11296.80 samples/sec Loss 13.3742 LearningRate 0.0935 Epoch: 1 Global Step: 7470 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:41,321-Speed 11286.50 samples/sec Loss 13.4522 LearningRate 0.0935 Epoch: 1 Global Step: 7480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:42,237-Speed 11187.49 samples/sec Loss 13.2912 LearningRate 0.0935 Epoch: 1 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:43,152-Speed 11201.62 samples/sec Loss 13.3301 LearningRate 0.0935 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:44,109-Speed 10712.04 samples/sec Loss 12.9647 LearningRate 0.0935 Epoch: 1 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:45,046-Speed 10938.94 samples/sec Loss 13.2110 LearningRate 0.0935 Epoch: 1 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:45,948-Speed 11357.84 samples/sec Loss 13.1642 LearningRate 0.0935 Epoch: 1 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:46,889-Speed 10883.73 samples/sec Loss 13.3694 LearningRate 0.0935 Epoch: 1 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:47,819-Speed 11017.99 samples/sec Loss 13.2077 LearningRate 0.0935 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:48,777-Speed 10698.28 samples/sec Loss 13.2238 LearningRate 0.0935 Epoch: 1 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:49,715-Speed 10922.59 samples/sec Loss 13.2887 LearningRate 0.0935 Epoch: 1 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:50,633-Speed 11157.76 samples/sec Loss 13.2894 LearningRate 0.0934 Epoch: 1 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:51,547-Speed 11219.38 samples/sec Loss 13.3280 LearningRate 0.0934 Epoch: 1 Global Step: 7590 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:52,493-Speed 10835.78 samples/sec Loss 13.3777 LearningRate 0.0934 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:53,402-Speed 11278.55 samples/sec Loss 13.2055 LearningRate 0.0934 Epoch: 1 Global Step: 7610 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:30:54,309-Speed 11295.22 samples/sec Loss 13.2778 LearningRate 0.0934 Epoch: 1 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:55,206-Speed 11427.31 samples/sec Loss 13.2458 LearningRate 0.0934 Epoch: 1 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:56,155-Speed 10793.28 samples/sec Loss 13.1737 LearningRate 0.0934 Epoch: 1 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:57,084-Speed 11036.20 samples/sec Loss 13.1410 LearningRate 0.0934 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:58,014-Speed 11010.97 samples/sec Loss 13.1892 LearningRate 0.0934 Epoch: 1 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:59,004-Speed 10355.43 samples/sec Loss 13.1313 LearningRate 0.0934 Epoch: 1 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:30:59,914-Speed 11261.27 samples/sec Loss 13.2017 LearningRate 0.0934 Epoch: 1 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:31:00,861-Speed 10816.61 samples/sec Loss 13.1409 LearningRate 0.0934 Epoch: 1 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:01,790-Speed 11031.48 samples/sec Loss 13.4198 LearningRate 0.0933 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:02,734-Speed 10862.98 samples/sec Loss 13.2870 LearningRate 0.0933 Epoch: 1 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:03,666-Speed 10990.92 samples/sec Loss 13.3393 LearningRate 0.0933 Epoch: 1 Global Step: 7720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:04,578-Speed 11234.16 samples/sec Loss 13.2796 LearningRate 0.0933 Epoch: 1 Global Step: 7730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:05,520-Speed 10876.12 samples/sec Loss 13.2239 LearningRate 0.0933 Epoch: 1 Global Step: 7740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:06,445-Speed 11077.45 samples/sec Loss 13.2122 LearningRate 0.0933 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:07,349-Speed 11340.54 samples/sec Loss 13.3412 LearningRate 0.0933 Epoch: 1 Global Step: 7760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:08,298-Speed 10800.40 samples/sec Loss 13.1961 LearningRate 0.0933 Epoch: 1 Global Step: 7770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:09,205-Speed 11298.92 samples/sec Loss 13.1555 LearningRate 0.0933 Epoch: 1 Global Step: 7780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:10,110-Speed 11315.42 samples/sec Loss 13.2387 LearningRate 0.0933 Epoch: 1 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:11,011-Speed 11378.07 samples/sec Loss 13.1522 LearningRate 0.0933 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:31:11,934-Speed 11098.99 samples/sec Loss 13.0106 LearningRate 0.0933 Epoch: 1 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:12,832-Speed 11416.59 samples/sec Loss 13.1782 LearningRate 0.0932 Epoch: 1 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:13,730-Speed 11407.71 samples/sec Loss 13.2246 LearningRate 0.0932 Epoch: 1 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:14,655-Speed 11074.17 samples/sec Loss 13.1670 LearningRate 0.0932 Epoch: 1 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:15,613-Speed 10700.84 samples/sec Loss 13.1349 LearningRate 0.0932 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:16,539-Speed 11077.19 samples/sec Loss 13.0120 LearningRate 0.0932 Epoch: 1 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:17,480-Speed 10886.98 samples/sec Loss 12.9237 LearningRate 0.0932 Epoch: 1 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:18,434-Speed 10746.09 samples/sec Loss 13.3429 LearningRate 0.0932 Epoch: 1 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:19,373-Speed 10907.87 samples/sec Loss 13.1811 LearningRate 0.0932 Epoch: 1 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:20,331-Speed 10697.25 samples/sec Loss 13.0631 LearningRate 0.0932 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:21,243-Speed 11238.31 samples/sec Loss 13.0904 LearningRate 0.0932 Epoch: 1 Global Step: 7910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:31:22,218-Speed 10510.76 samples/sec Loss 13.0354 LearningRate 0.0932 Epoch: 1 Global Step: 7920 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:31:23,117-Speed 11395.94 samples/sec Loss 13.2651 LearningRate 0.0931 Epoch: 1 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:24,068-Speed 10770.25 samples/sec Loss 13.1537 LearningRate 0.0931 Epoch: 1 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:25,006-Speed 10927.18 samples/sec Loss 12.9284 LearningRate 0.0931 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:25,906-Speed 11392.87 samples/sec Loss 13.0558 LearningRate 0.0931 Epoch: 1 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:26,858-Speed 10757.41 samples/sec Loss 13.0947 LearningRate 0.0931 Epoch: 1 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:27,792-Speed 10972.95 samples/sec Loss 13.0732 LearningRate 0.0931 Epoch: 1 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:28,738-Speed 10838.99 samples/sec Loss 13.0905 LearningRate 0.0931 Epoch: 1 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:29,687-Speed 10796.12 samples/sec Loss 13.1004 LearningRate 0.0931 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:31:51,090-[lfw][8000]XNorm: 16.119431 Training: 2022-04-10 23:31:51,091-[lfw][8000]Accuracy-Flip: 0.98950+-0.00435 Training: 2022-04-10 23:31:51,091-[lfw][8000]Accuracy-Highest: 0.98950 Training: 2022-04-10 23:32:15,715-[cfp_fp][8000]XNorm: 13.481331 Training: 2022-04-10 23:32:15,716-[cfp_fp][8000]Accuracy-Flip: 0.87929+-0.01871 Training: 2022-04-10 23:32:15,716-[cfp_fp][8000]Accuracy-Highest: 0.87929 Training: 2022-04-10 23:32:37,084-[agedb_30][8000]XNorm: 15.743294 Training: 2022-04-10 23:32:37,085-[agedb_30][8000]Accuracy-Flip: 0.91400+-0.02047 Training: 2022-04-10 23:32:37,085-[agedb_30][8000]Accuracy-Highest: 0.91400 Training: 2022-04-10 23:32:38,015-Speed 149.87 samples/sec Loss 13.1074 LearningRate 0.0931 Epoch: 1 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:38,953-Speed 10944.65 samples/sec Loss 12.9188 LearningRate 0.0931 Epoch: 1 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:39,909-Speed 10721.65 samples/sec Loss 12.9679 LearningRate 0.0931 Epoch: 1 Global Step: 8030 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:32:40,843-Speed 10970.62 samples/sec Loss 13.2643 LearningRate 0.0931 Epoch: 1 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:41,800-Speed 10705.07 samples/sec Loss 12.9670 LearningRate 0.0930 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:42,720-Speed 11138.16 samples/sec Loss 13.2647 LearningRate 0.0930 Epoch: 1 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:43,637-Speed 11171.20 samples/sec Loss 13.1373 LearningRate 0.0930 Epoch: 1 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:44,528-Speed 11510.12 samples/sec Loss 13.1197 LearningRate 0.0930 Epoch: 1 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:45,438-Speed 11259.22 samples/sec Loss 12.8927 LearningRate 0.0930 Epoch: 1 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:46,370-Speed 10989.62 samples/sec Loss 12.9737 LearningRate 0.0930 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:47,287-Speed 11174.22 samples/sec Loss 13.0500 LearningRate 0.0930 Epoch: 1 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:48,218-Speed 11012.23 samples/sec Loss 13.1303 LearningRate 0.0930 Epoch: 1 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:49,144-Speed 11054.12 samples/sec Loss 12.9923 LearningRate 0.0930 Epoch: 1 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:50,122-Speed 10486.37 samples/sec Loss 12.9987 LearningRate 0.0930 Epoch: 1 Global Step: 8140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:32:51,064-Speed 10880.06 samples/sec Loss 12.8729 LearningRate 0.0930 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:32:51,957-Speed 11474.32 samples/sec Loss 12.9335 LearningRate 0.0930 Epoch: 1 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:32:52,890-Speed 10988.12 samples/sec Loss 13.0160 LearningRate 0.0929 Epoch: 1 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:53,823-Speed 10979.26 samples/sec Loss 13.1339 LearningRate 0.0929 Epoch: 1 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:54,764-Speed 10894.14 samples/sec Loss 12.9464 LearningRate 0.0929 Epoch: 1 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:55,723-Speed 10685.49 samples/sec Loss 12.9112 LearningRate 0.0929 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:56,654-Speed 11006.26 samples/sec Loss 12.9428 LearningRate 0.0929 Epoch: 1 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:57,554-Speed 11384.92 samples/sec Loss 12.9428 LearningRate 0.0929 Epoch: 1 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:58,494-Speed 10897.59 samples/sec Loss 12.8725 LearningRate 0.0929 Epoch: 1 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:32:59,424-Speed 11024.53 samples/sec Loss 12.9609 LearningRate 0.0929 Epoch: 1 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:00,324-Speed 11381.20 samples/sec Loss 12.8297 LearningRate 0.0929 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:01,247-Speed 11106.99 samples/sec Loss 12.9496 LearningRate 0.0929 Epoch: 1 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:02,203-Speed 10718.21 samples/sec Loss 12.8301 LearningRate 0.0929 Epoch: 1 Global Step: 8270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:03,134-Speed 11008.92 samples/sec Loss 12.9031 LearningRate 0.0929 Epoch: 1 Global Step: 8280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:04,074-Speed 10903.70 samples/sec Loss 12.9214 LearningRate 0.0928 Epoch: 1 Global Step: 8290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:05,016-Speed 10875.44 samples/sec Loss 12.8883 LearningRate 0.0928 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:05,935-Speed 11148.20 samples/sec Loss 12.6966 LearningRate 0.0928 Epoch: 1 Global Step: 8310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:06,867-Speed 10994.07 samples/sec Loss 12.9053 LearningRate 0.0928 Epoch: 1 Global Step: 8320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:07,783-Speed 11193.46 samples/sec Loss 12.8091 LearningRate 0.0928 Epoch: 1 Global Step: 8330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:08,661-Speed 11677.79 samples/sec Loss 12.8994 LearningRate 0.0928 Epoch: 1 Global Step: 8340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:09,551-Speed 11512.73 samples/sec Loss 12.8911 LearningRate 0.0928 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:10,438-Speed 11547.18 samples/sec Loss 12.9172 LearningRate 0.0928 Epoch: 1 Global Step: 8360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:11,327-Speed 11525.48 samples/sec Loss 12.9139 LearningRate 0.0928 Epoch: 1 Global Step: 8370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:12,224-Speed 11429.05 samples/sec Loss 12.8514 LearningRate 0.0928 Epoch: 1 Global Step: 8380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:13,194-Speed 10555.00 samples/sec Loss 12.8773 LearningRate 0.0928 Epoch: 1 Global Step: 8390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:14,115-Speed 11134.43 samples/sec Loss 12.8284 LearningRate 0.0927 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:15,050-Speed 10962.10 samples/sec Loss 13.0164 LearningRate 0.0927 Epoch: 1 Global Step: 8410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:15,987-Speed 10929.15 samples/sec Loss 12.9819 LearningRate 0.0927 Epoch: 1 Global Step: 8420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:16,905-Speed 11172.62 samples/sec Loss 13.0217 LearningRate 0.0927 Epoch: 1 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:17,841-Speed 10945.19 samples/sec Loss 12.7883 LearningRate 0.0927 Epoch: 1 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:18,766-Speed 11073.57 samples/sec Loss 12.7349 LearningRate 0.0927 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:19,682-Speed 11186.63 samples/sec Loss 12.8766 LearningRate 0.0927 Epoch: 1 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:20,583-Speed 11375.30 samples/sec Loss 12.9262 LearningRate 0.0927 Epoch: 1 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:21,463-Speed 11651.35 samples/sec Loss 12.8760 LearningRate 0.0927 Epoch: 1 Global Step: 8480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:22,369-Speed 11310.89 samples/sec Loss 12.8318 LearningRate 0.0927 Epoch: 1 Global Step: 8490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:23,291-Speed 11107.36 samples/sec Loss 12.7985 LearningRate 0.0927 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:24,240-Speed 10804.75 samples/sec Loss 12.8875 LearningRate 0.0927 Epoch: 1 Global Step: 8510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:25,170-Speed 11015.16 samples/sec Loss 12.7207 LearningRate 0.0926 Epoch: 1 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:26,095-Speed 11079.16 samples/sec Loss 12.6806 LearningRate 0.0926 Epoch: 1 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:27,045-Speed 10779.66 samples/sec Loss 12.8200 LearningRate 0.0926 Epoch: 1 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:27,997-Speed 10761.65 samples/sec Loss 12.6248 LearningRate 0.0926 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:28,927-Speed 11023.54 samples/sec Loss 12.6655 LearningRate 0.0926 Epoch: 1 Global Step: 8560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:29,859-Speed 10989.90 samples/sec Loss 12.7621 LearningRate 0.0926 Epoch: 1 Global Step: 8570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:30,818-Speed 10691.92 samples/sec Loss 12.8632 LearningRate 0.0926 Epoch: 1 Global Step: 8580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:31,735-Speed 11172.89 samples/sec Loss 12.5057 LearningRate 0.0926 Epoch: 1 Global Step: 8590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:32,648-Speed 11221.43 samples/sec Loss 12.8407 LearningRate 0.0926 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:33,527-Speed 11663.23 samples/sec Loss 13.0516 LearningRate 0.0926 Epoch: 1 Global Step: 8610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:34,420-Speed 11466.81 samples/sec Loss 12.6848 LearningRate 0.0926 Epoch: 1 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:35,331-Speed 11252.78 samples/sec Loss 12.6524 LearningRate 0.0926 Epoch: 1 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:36,262-Speed 11009.93 samples/sec Loss 12.9446 LearningRate 0.0925 Epoch: 1 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:37,189-Speed 11050.14 samples/sec Loss 12.7610 LearningRate 0.0925 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:38,105-Speed 11180.47 samples/sec Loss 12.5824 LearningRate 0.0925 Epoch: 1 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:39,065-Speed 10677.83 samples/sec Loss 12.6494 LearningRate 0.0925 Epoch: 1 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:40,011-Speed 10828.47 samples/sec Loss 12.6596 LearningRate 0.0925 Epoch: 1 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:40,951-Speed 10909.63 samples/sec Loss 12.5764 LearningRate 0.0925 Epoch: 1 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:41,899-Speed 10811.06 samples/sec Loss 12.6299 LearningRate 0.0925 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:42,844-Speed 10839.34 samples/sec Loss 12.5524 LearningRate 0.0925 Epoch: 1 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:43,835-Speed 10352.72 samples/sec Loss 12.7608 LearningRate 0.0925 Epoch: 1 Global Step: 8720 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:33:44,722-Speed 11550.09 samples/sec Loss 12.7333 LearningRate 0.0925 Epoch: 1 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:45,618-Speed 11442.27 samples/sec Loss 12.7458 LearningRate 0.0925 Epoch: 1 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:46,501-Speed 11593.89 samples/sec Loss 12.6562 LearningRate 0.0925 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:47,419-Speed 11168.74 samples/sec Loss 12.6181 LearningRate 0.0924 Epoch: 1 Global Step: 8760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:48,351-Speed 10988.22 samples/sec Loss 12.7041 LearningRate 0.0924 Epoch: 1 Global Step: 8770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:49,294-Speed 10866.64 samples/sec Loss 12.7577 LearningRate 0.0924 Epoch: 1 Global Step: 8780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:50,242-Speed 10815.91 samples/sec Loss 12.6448 LearningRate 0.0924 Epoch: 1 Global Step: 8790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:51,165-Speed 11101.79 samples/sec Loss 12.6022 LearningRate 0.0924 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:52,090-Speed 11072.48 samples/sec Loss 12.4364 LearningRate 0.0924 Epoch: 1 Global Step: 8810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:53,020-Speed 11019.53 samples/sec Loss 12.6750 LearningRate 0.0924 Epoch: 1 Global Step: 8820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:53,960-Speed 10895.99 samples/sec Loss 12.6369 LearningRate 0.0924 Epoch: 1 Global Step: 8830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:54,912-Speed 10767.89 samples/sec Loss 12.5227 LearningRate 0.0924 Epoch: 1 Global Step: 8840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:55,803-Speed 11520.72 samples/sec Loss 12.5815 LearningRate 0.0924 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:56,715-Speed 11225.33 samples/sec Loss 12.4432 LearningRate 0.0924 Epoch: 1 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:57,620-Speed 11327.62 samples/sec Loss 12.5233 LearningRate 0.0924 Epoch: 1 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:33:58,574-Speed 10738.64 samples/sec Loss 12.5990 LearningRate 0.0923 Epoch: 1 Global Step: 8880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:33:59,502-Speed 11045.32 samples/sec Loss 12.4348 LearningRate 0.0923 Epoch: 1 Global Step: 8890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:00,430-Speed 11040.25 samples/sec Loss 12.3859 LearningRate 0.0923 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:01,343-Speed 11225.55 samples/sec Loss 12.5056 LearningRate 0.0923 Epoch: 1 Global Step: 8910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:02,311-Speed 10582.39 samples/sec Loss 12.4793 LearningRate 0.0923 Epoch: 1 Global Step: 8920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:03,278-Speed 10604.71 samples/sec Loss 12.4524 LearningRate 0.0923 Epoch: 1 Global Step: 8930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:04,191-Speed 11227.24 samples/sec Loss 12.6160 LearningRate 0.0923 Epoch: 1 Global Step: 8940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:05,137-Speed 10841.91 samples/sec Loss 12.5209 LearningRate 0.0923 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:06,047-Speed 11255.15 samples/sec Loss 12.4745 LearningRate 0.0923 Epoch: 1 Global Step: 8960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:06,969-Speed 11124.49 samples/sec Loss 12.6861 LearningRate 0.0923 Epoch: 1 Global Step: 8970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:07,932-Speed 10640.83 samples/sec Loss 12.5310 LearningRate 0.0923 Epoch: 1 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:08,867-Speed 10961.32 samples/sec Loss 12.3140 LearningRate 0.0923 Epoch: 1 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:09,783-Speed 11184.70 samples/sec Loss 12.4975 LearningRate 0.0922 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:10,673-Speed 11509.12 samples/sec Loss 12.4446 LearningRate 0.0922 Epoch: 1 Global Step: 9010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:11,568-Speed 11452.93 samples/sec Loss 12.5538 LearningRate 0.0922 Epoch: 1 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:12,484-Speed 11189.72 samples/sec Loss 12.5682 LearningRate 0.0922 Epoch: 1 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:13,408-Speed 11096.88 samples/sec Loss 12.5700 LearningRate 0.0922 Epoch: 1 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:14,399-Speed 10337.48 samples/sec Loss 12.5962 LearningRate 0.0922 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:15,321-Speed 11116.62 samples/sec Loss 12.4249 LearningRate 0.0922 Epoch: 1 Global Step: 9060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:16,252-Speed 11014.97 samples/sec Loss 12.6486 LearningRate 0.0922 Epoch: 1 Global Step: 9070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:17,150-Speed 11403.22 samples/sec Loss 12.5832 LearningRate 0.0922 Epoch: 1 Global Step: 9080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:18,103-Speed 10757.97 samples/sec Loss 12.6387 LearningRate 0.0922 Epoch: 1 Global Step: 9090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:19,031-Speed 11036.83 samples/sec Loss 12.6294 LearningRate 0.0922 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:19,953-Speed 11117.20 samples/sec Loss 12.5296 LearningRate 0.0921 Epoch: 1 Global Step: 9110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:20,856-Speed 11342.56 samples/sec Loss 12.5274 LearningRate 0.0921 Epoch: 1 Global Step: 9120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:34:21,760-Speed 11345.53 samples/sec Loss 12.4787 LearningRate 0.0921 Epoch: 1 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:22,660-Speed 11381.24 samples/sec Loss 12.5929 LearningRate 0.0921 Epoch: 1 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:23,563-Speed 11348.15 samples/sec Loss 12.6409 LearningRate 0.0921 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:24,483-Speed 11131.10 samples/sec Loss 12.5645 LearningRate 0.0921 Epoch: 1 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:25,400-Speed 11179.55 samples/sec Loss 12.6547 LearningRate 0.0921 Epoch: 1 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:26,327-Speed 11061.64 samples/sec Loss 12.3175 LearningRate 0.0921 Epoch: 1 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:27,252-Speed 11076.10 samples/sec Loss 12.4331 LearningRate 0.0921 Epoch: 1 Global Step: 9190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:28,185-Speed 10991.17 samples/sec Loss 12.4629 LearningRate 0.0921 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:29,082-Speed 11416.25 samples/sec Loss 12.4512 LearningRate 0.0921 Epoch: 1 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:29,990-Speed 11285.85 samples/sec Loss 12.5506 LearningRate 0.0921 Epoch: 1 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:30,877-Speed 11558.29 samples/sec Loss 12.4318 LearningRate 0.0920 Epoch: 1 Global Step: 9230 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:34:31,820-Speed 10864.99 samples/sec Loss 12.2938 LearningRate 0.0920 Epoch: 1 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:34:32,726-Speed 11316.46 samples/sec Loss 12.4828 LearningRate 0.0920 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:34:33,634-Speed 11285.66 samples/sec Loss 12.3823 LearningRate 0.0920 Epoch: 1 Global Step: 9260 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:34:34,549-Speed 11196.55 samples/sec Loss 12.5226 LearningRate 0.0920 Epoch: 1 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:35,452-Speed 11339.35 samples/sec Loss 12.6345 LearningRate 0.0920 Epoch: 1 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:36,391-Speed 10918.28 samples/sec Loss 12.4938 LearningRate 0.0920 Epoch: 1 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:37,308-Speed 11179.07 samples/sec Loss 12.3889 LearningRate 0.0920 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:38,293-Speed 10402.98 samples/sec Loss 12.5097 LearningRate 0.0920 Epoch: 1 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:39,225-Speed 10989.07 samples/sec Loss 12.6305 LearningRate 0.0920 Epoch: 1 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:40,138-Speed 11229.79 samples/sec Loss 12.5523 LearningRate 0.0920 Epoch: 1 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:41,047-Speed 11265.89 samples/sec Loss 12.4206 LearningRate 0.0920 Epoch: 1 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:41,979-Speed 11004.68 samples/sec Loss 12.5290 LearningRate 0.0919 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:42,903-Speed 11083.15 samples/sec Loss 12.3857 LearningRate 0.0919 Epoch: 1 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:43,821-Speed 11169.20 samples/sec Loss 12.3903 LearningRate 0.0919 Epoch: 1 Global Step: 9370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:34:44,700-Speed 11656.32 samples/sec Loss 12.3984 LearningRate 0.0919 Epoch: 1 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:45,598-Speed 11414.17 samples/sec Loss 12.3906 LearningRate 0.0919 Epoch: 1 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:46,517-Speed 11148.45 samples/sec Loss 12.3692 LearningRate 0.0919 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:47,451-Speed 10968.11 samples/sec Loss 12.5148 LearningRate 0.0919 Epoch: 1 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:48,383-Speed 10996.01 samples/sec Loss 12.5336 LearningRate 0.0919 Epoch: 1 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:49,295-Speed 11240.22 samples/sec Loss 12.3203 LearningRate 0.0919 Epoch: 1 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:50,196-Speed 11375.44 samples/sec Loss 12.4548 LearningRate 0.0919 Epoch: 1 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:51,103-Speed 11299.94 samples/sec Loss 12.3187 LearningRate 0.0919 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:52,028-Speed 11075.45 samples/sec Loss 12.3997 LearningRate 0.0919 Epoch: 1 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:52,947-Speed 11143.94 samples/sec Loss 12.4233 LearningRate 0.0918 Epoch: 1 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:53,860-Speed 11230.20 samples/sec Loss 12.3576 LearningRate 0.0918 Epoch: 1 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:34:54,775-Speed 11195.46 samples/sec Loss 12.4233 LearningRate 0.0918 Epoch: 1 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:55,713-Speed 10920.66 samples/sec Loss 12.3580 LearningRate 0.0918 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:56,633-Speed 11141.52 samples/sec Loss 12.3020 LearningRate 0.0918 Epoch: 1 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:57,552-Speed 11147.42 samples/sec Loss 12.3859 LearningRate 0.0918 Epoch: 1 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:58,498-Speed 10829.21 samples/sec Loss 12.3328 LearningRate 0.0918 Epoch: 1 Global Step: 9530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:34:59,444-Speed 10840.56 samples/sec Loss 12.4510 LearningRate 0.0918 Epoch: 1 Global Step: 9540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:35:00,381-Speed 10936.46 samples/sec Loss 12.4663 LearningRate 0.0918 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:35:01,327-Speed 10828.35 samples/sec Loss 12.3398 LearningRate 0.0918 Epoch: 1 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:35:02,258-Speed 11010.24 samples/sec Loss 12.3634 LearningRate 0.0918 Epoch: 1 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:03,182-Speed 11088.41 samples/sec Loss 12.4907 LearningRate 0.0918 Epoch: 1 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:04,084-Speed 11371.75 samples/sec Loss 12.2955 LearningRate 0.0917 Epoch: 1 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:05,019-Speed 10958.78 samples/sec Loss 12.3793 LearningRate 0.0917 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:05,947-Speed 11034.04 samples/sec Loss 12.3629 LearningRate 0.0917 Epoch: 1 Global Step: 9610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:06,868-Speed 11124.69 samples/sec Loss 12.3019 LearningRate 0.0917 Epoch: 1 Global Step: 9620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:07,784-Speed 11193.49 samples/sec Loss 12.4048 LearningRate 0.0917 Epoch: 1 Global Step: 9630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:08,706-Speed 11108.52 samples/sec Loss 12.4683 LearningRate 0.0917 Epoch: 1 Global Step: 9640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:09,603-Speed 11429.12 samples/sec Loss 12.3117 LearningRate 0.0917 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:10,501-Speed 11414.23 samples/sec Loss 12.4102 LearningRate 0.0917 Epoch: 1 Global Step: 9660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:11,405-Speed 11327.57 samples/sec Loss 12.2991 LearningRate 0.0917 Epoch: 1 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:12,310-Speed 11327.84 samples/sec Loss 12.1683 LearningRate 0.0917 Epoch: 1 Global Step: 9680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:13,217-Speed 11295.05 samples/sec Loss 12.4854 LearningRate 0.0917 Epoch: 1 Global Step: 9690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:14,178-Speed 10664.70 samples/sec Loss 12.3067 LearningRate 0.0917 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:15,123-Speed 10849.14 samples/sec Loss 12.2978 LearningRate 0.0916 Epoch: 1 Global Step: 9710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:16,014-Speed 11503.51 samples/sec Loss 12.2384 LearningRate 0.0916 Epoch: 1 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:16,930-Speed 11182.58 samples/sec Loss 12.3181 LearningRate 0.0916 Epoch: 1 Global Step: 9730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:17,873-Speed 10864.42 samples/sec Loss 12.1978 LearningRate 0.0916 Epoch: 1 Global Step: 9740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:18,806-Speed 10989.91 samples/sec Loss 12.2777 LearningRate 0.0916 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:19,720-Speed 11205.99 samples/sec Loss 12.2197 LearningRate 0.0916 Epoch: 1 Global Step: 9760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:20,634-Speed 11212.74 samples/sec Loss 12.3058 LearningRate 0.0916 Epoch: 1 Global Step: 9770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:21,570-Speed 10951.57 samples/sec Loss 12.3721 LearningRate 0.0916 Epoch: 1 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:22,476-Speed 11304.21 samples/sec Loss 12.1448 LearningRate 0.0916 Epoch: 1 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:23,416-Speed 10906.54 samples/sec Loss 12.2263 LearningRate 0.0916 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:24,373-Speed 10702.22 samples/sec Loss 12.0627 LearningRate 0.0916 Epoch: 1 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:25,326-Speed 10749.65 samples/sec Loss 12.1460 LearningRate 0.0916 Epoch: 1 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:26,246-Speed 11145.65 samples/sec Loss 12.2654 LearningRate 0.0915 Epoch: 1 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:27,191-Speed 10844.68 samples/sec Loss 12.3480 LearningRate 0.0915 Epoch: 1 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:28,145-Speed 10747.88 samples/sec Loss 12.2338 LearningRate 0.0915 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:29,081-Speed 10946.84 samples/sec Loss 12.3032 LearningRate 0.0915 Epoch: 1 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:30,006-Speed 11070.11 samples/sec Loss 12.0355 LearningRate 0.0915 Epoch: 1 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:30,931-Speed 11076.02 samples/sec Loss 11.9951 LearningRate 0.0915 Epoch: 1 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:31,855-Speed 11090.24 samples/sec Loss 12.1190 LearningRate 0.0915 Epoch: 1 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:32,784-Speed 11031.89 samples/sec Loss 12.2705 LearningRate 0.0915 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:33,672-Speed 11540.86 samples/sec Loss 12.2532 LearningRate 0.0915 Epoch: 1 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:34,571-Speed 11400.62 samples/sec Loss 12.2095 LearningRate 0.0915 Epoch: 1 Global Step: 9920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:35,484-Speed 11229.71 samples/sec Loss 12.0443 LearningRate 0.0915 Epoch: 1 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:35:36,382-Speed 11408.04 samples/sec Loss 12.1317 LearningRate 0.0915 Epoch: 1 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:37,287-Speed 11324.92 samples/sec Loss 12.3713 LearningRate 0.0914 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:38,205-Speed 11159.66 samples/sec Loss 12.3319 LearningRate 0.0914 Epoch: 1 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:39,146-Speed 10889.41 samples/sec Loss 12.2860 LearningRate 0.0914 Epoch: 1 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:40,071-Speed 11092.08 samples/sec Loss 12.1270 LearningRate 0.0914 Epoch: 1 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:41,035-Speed 10634.65 samples/sec Loss 12.3096 LearningRate 0.0914 Epoch: 1 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:35:41,972-Speed 10926.64 samples/sec Loss 12.2606 LearningRate 0.0914 Epoch: 1 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:36:03,115-[lfw][10000]XNorm: 15.775360 Training: 2022-04-10 23:36:03,115-[lfw][10000]Accuracy-Flip: 0.99167+-0.00441 Training: 2022-04-10 23:36:03,116-[lfw][10000]Accuracy-Highest: 0.99167 Training: 2022-04-10 23:36:27,721-[cfp_fp][10000]XNorm: 13.152668 Training: 2022-04-10 23:36:27,721-[cfp_fp][10000]Accuracy-Flip: 0.88514+-0.01868 Training: 2022-04-10 23:36:27,722-[cfp_fp][10000]Accuracy-Highest: 0.88514 Training: 2022-04-10 23:36:48,937-[agedb_30][10000]XNorm: 15.292808 Training: 2022-04-10 23:36:48,938-[agedb_30][10000]Accuracy-Flip: 0.92917+-0.01948 Training: 2022-04-10 23:36:48,938-[agedb_30][10000]Accuracy-Highest: 0.92917 Training: 2022-04-10 23:36:49,862-Speed 150.84 samples/sec Loss 12.0922 LearningRate 0.0914 Epoch: 1 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:50,781-Speed 11161.35 samples/sec Loss 12.0292 LearningRate 0.0914 Epoch: 1 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:51,704-Speed 11095.41 samples/sec Loss 12.1691 LearningRate 0.0914 Epoch: 1 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:52,646-Speed 10876.74 samples/sec Loss 12.1139 LearningRate 0.0914 Epoch: 1 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:53,581-Speed 10967.64 samples/sec Loss 12.2122 LearningRate 0.0914 Epoch: 1 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:54,527-Speed 10828.64 samples/sec Loss 12.0546 LearningRate 0.0913 Epoch: 1 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:55,444-Speed 11178.38 samples/sec Loss 12.1487 LearningRate 0.0913 Epoch: 1 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:56,369-Speed 11077.37 samples/sec Loss 12.3022 LearningRate 0.0913 Epoch: 1 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:57,283-Speed 11214.10 samples/sec Loss 12.2562 LearningRate 0.0913 Epoch: 1 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:58,202-Speed 11142.72 samples/sec Loss 12.3492 LearningRate 0.0913 Epoch: 1 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:36:59,148-Speed 10835.76 samples/sec Loss 12.1757 LearningRate 0.0913 Epoch: 1 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:00,065-Speed 11172.70 samples/sec Loss 12.1626 LearningRate 0.0913 Epoch: 1 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:00,969-Speed 11332.38 samples/sec Loss 12.0899 LearningRate 0.0913 Epoch: 1 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:01,921-Speed 10765.34 samples/sec Loss 12.2637 LearningRate 0.0913 Epoch: 1 Global Step: 10140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:37:02,845-Speed 11096.02 samples/sec Loss 12.2686 LearningRate 0.0913 Epoch: 1 Global Step: 10150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:37:03,780-Speed 10959.59 samples/sec Loss 12.3093 LearningRate 0.0913 Epoch: 1 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:04,703-Speed 11091.55 samples/sec Loss 12.1114 LearningRate 0.0913 Epoch: 1 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:05,648-Speed 10990.64 samples/sec Loss 12.0964 LearningRate 0.0912 Epoch: 1 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:06,558-Speed 11267.99 samples/sec Loss 12.1082 LearningRate 0.0912 Epoch: 1 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:07,486-Speed 11045.30 samples/sec Loss 12.1464 LearningRate 0.0912 Epoch: 1 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:08,446-Speed 10668.84 samples/sec Loss 12.0510 LearningRate 0.0912 Epoch: 1 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:09,364-Speed 11165.24 samples/sec Loss 12.0562 LearningRate 0.0912 Epoch: 1 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:10,275-Speed 11252.90 samples/sec Loss 12.0930 LearningRate 0.0912 Epoch: 1 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:11,186-Speed 11243.09 samples/sec Loss 12.1053 LearningRate 0.0912 Epoch: 1 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:12,100-Speed 11220.34 samples/sec Loss 12.0227 LearningRate 0.0912 Epoch: 1 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:12,999-Speed 11390.30 samples/sec Loss 12.0449 LearningRate 0.0912 Epoch: 1 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:13,894-Speed 11449.21 samples/sec Loss 11.9300 LearningRate 0.0912 Epoch: 1 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:14,824-Speed 11014.98 samples/sec Loss 12.0911 LearningRate 0.0912 Epoch: 1 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:15,736-Speed 11242.52 samples/sec Loss 12.1439 LearningRate 0.0912 Epoch: 1 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:16,672-Speed 10947.97 samples/sec Loss 12.2058 LearningRate 0.0911 Epoch: 1 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:17,621-Speed 10794.73 samples/sec Loss 12.0967 LearningRate 0.0911 Epoch: 1 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:18,540-Speed 11147.85 samples/sec Loss 12.3077 LearningRate 0.0911 Epoch: 1 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:19,444-Speed 11332.92 samples/sec Loss 12.0497 LearningRate 0.0911 Epoch: 1 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:20,387-Speed 10876.56 samples/sec Loss 11.8865 LearningRate 0.0911 Epoch: 1 Global Step: 10340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:21,268-Speed 11627.12 samples/sec Loss 12.1905 LearningRate 0.0911 Epoch: 1 Global Step: 10350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:22,176-Speed 11282.00 samples/sec Loss 12.1653 LearningRate 0.0911 Epoch: 1 Global Step: 10360 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:37:23,077-Speed 11370.29 samples/sec Loss 12.2833 LearningRate 0.0911 Epoch: 1 Global Step: 10370 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:37:23,993-Speed 11188.89 samples/sec Loss 12.2367 LearningRate 0.0911 Epoch: 1 Global Step: 10380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:24,929-Speed 10948.03 samples/sec Loss 12.2703 LearningRate 0.0911 Epoch: 1 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:25,869-Speed 10904.59 samples/sec Loss 12.1353 LearningRate 0.0911 Epoch: 1 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:26,804-Speed 10960.67 samples/sec Loss 11.8882 LearningRate 0.0911 Epoch: 1 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:27,732-Speed 11043.93 samples/sec Loss 12.0514 LearningRate 0.0910 Epoch: 1 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:28,661-Speed 11029.89 samples/sec Loss 12.1183 LearningRate 0.0910 Epoch: 1 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:29,621-Speed 10676.19 samples/sec Loss 11.9983 LearningRate 0.0910 Epoch: 1 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:30,534-Speed 11229.39 samples/sec Loss 12.0834 LearningRate 0.0910 Epoch: 1 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:31,469-Speed 10951.81 samples/sec Loss 12.2708 LearningRate 0.0910 Epoch: 1 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:32,378-Speed 11283.31 samples/sec Loss 12.1780 LearningRate 0.0910 Epoch: 1 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:33,273-Speed 11455.91 samples/sec Loss 11.9329 LearningRate 0.0910 Epoch: 1 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:34,180-Speed 11317.76 samples/sec Loss 12.0478 LearningRate 0.0910 Epoch: 1 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:35,072-Speed 11488.36 samples/sec Loss 12.0604 LearningRate 0.0910 Epoch: 1 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:36,003-Speed 11006.66 samples/sec Loss 11.9588 LearningRate 0.0910 Epoch: 1 Global Step: 10510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:36,913-Speed 11253.09 samples/sec Loss 11.9046 LearningRate 0.0910 Epoch: 1 Global Step: 10520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:37,838-Speed 11080.79 samples/sec Loss 11.9132 LearningRate 0.0910 Epoch: 1 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:38,780-Speed 10883.74 samples/sec Loss 12.0560 LearningRate 0.0909 Epoch: 1 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:39,693-Speed 11218.97 samples/sec Loss 12.1156 LearningRate 0.0909 Epoch: 1 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:40,603-Speed 11260.01 samples/sec Loss 11.8968 LearningRate 0.0909 Epoch: 1 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:41,499-Speed 11431.87 samples/sec Loss 11.8773 LearningRate 0.0909 Epoch: 1 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:42,422-Speed 11100.94 samples/sec Loss 11.8321 LearningRate 0.0909 Epoch: 1 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:43,383-Speed 10665.68 samples/sec Loss 11.9672 LearningRate 0.0909 Epoch: 1 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:44,310-Speed 11051.40 samples/sec Loss 11.8183 LearningRate 0.0909 Epoch: 1 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:45,211-Speed 11381.22 samples/sec Loss 11.9242 LearningRate 0.0909 Epoch: 1 Global Step: 10610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:46,141-Speed 11014.06 samples/sec Loss 12.0089 LearningRate 0.0909 Epoch: 1 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:47,098-Speed 10713.57 samples/sec Loss 11.7624 LearningRate 0.0909 Epoch: 1 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:48,015-Speed 11172.29 samples/sec Loss 11.9485 LearningRate 0.0909 Epoch: 1 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:48,967-Speed 10754.93 samples/sec Loss 11.9755 LearningRate 0.0909 Epoch: 1 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:49,865-Speed 11420.25 samples/sec Loss 12.0488 LearningRate 0.0908 Epoch: 1 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:50,825-Speed 10668.71 samples/sec Loss 12.0075 LearningRate 0.0908 Epoch: 1 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:51,742-Speed 11173.60 samples/sec Loss 12.0350 LearningRate 0.0908 Epoch: 1 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:52,714-Speed 10544.71 samples/sec Loss 11.8434 LearningRate 0.0908 Epoch: 1 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:53,661-Speed 10819.71 samples/sec Loss 11.9160 LearningRate 0.0908 Epoch: 1 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:37:54,567-Speed 11313.59 samples/sec Loss 11.7900 LearningRate 0.0908 Epoch: 1 Global Step: 10710 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:37:55,469-Speed 11366.37 samples/sec Loss 11.7549 LearningRate 0.0908 Epoch: 1 Global Step: 10720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:56,377-Speed 11284.19 samples/sec Loss 11.9109 LearningRate 0.0908 Epoch: 1 Global Step: 10730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:57,265-Speed 11542.01 samples/sec Loss 12.0219 LearningRate 0.0908 Epoch: 1 Global Step: 10740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:58,184-Speed 11146.34 samples/sec Loss 11.9574 LearningRate 0.0908 Epoch: 1 Global Step: 10750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:37:59,116-Speed 10998.85 samples/sec Loss 12.0209 LearningRate 0.0908 Epoch: 1 Global Step: 10760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:00,002-Speed 11566.78 samples/sec Loss 11.8101 LearningRate 0.0908 Epoch: 1 Global Step: 10770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:00,917-Speed 11212.25 samples/sec Loss 11.9281 LearningRate 0.0907 Epoch: 1 Global Step: 10780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:01,834-Speed 11165.27 samples/sec Loss 11.7638 LearningRate 0.0907 Epoch: 1 Global Step: 10790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:02,743-Speed 11277.28 samples/sec Loss 11.8773 LearningRate 0.0907 Epoch: 1 Global Step: 10800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:03,646-Speed 11344.06 samples/sec Loss 11.7539 LearningRate 0.0907 Epoch: 1 Global Step: 10810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:04,578-Speed 10997.10 samples/sec Loss 11.8392 LearningRate 0.0907 Epoch: 1 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:05,487-Speed 11276.70 samples/sec Loss 11.6946 LearningRate 0.0907 Epoch: 1 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:06,436-Speed 10798.61 samples/sec Loss 11.8716 LearningRate 0.0907 Epoch: 1 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:07,388-Speed 10762.10 samples/sec Loss 12.0512 LearningRate 0.0907 Epoch: 1 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:08,289-Speed 11373.62 samples/sec Loss 12.0617 LearningRate 0.0907 Epoch: 1 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:09,220-Speed 10999.19 samples/sec Loss 11.8483 LearningRate 0.0907 Epoch: 1 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:10,108-Speed 11548.69 samples/sec Loss 11.9854 LearningRate 0.0907 Epoch: 1 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:11,009-Speed 11369.53 samples/sec Loss 11.8656 LearningRate 0.0907 Epoch: 1 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:11,911-Speed 11355.56 samples/sec Loss 11.8640 LearningRate 0.0906 Epoch: 1 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:12,839-Speed 11044.60 samples/sec Loss 11.9165 LearningRate 0.0906 Epoch: 1 Global Step: 10910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:13,753-Speed 11213.80 samples/sec Loss 11.7346 LearningRate 0.0906 Epoch: 1 Global Step: 10920 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:38:14,695-Speed 10884.75 samples/sec Loss 11.8917 LearningRate 0.0906 Epoch: 1 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:15,625-Speed 11018.98 samples/sec Loss 11.8529 LearningRate 0.0906 Epoch: 1 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:16,590-Speed 10617.98 samples/sec Loss 12.0301 LearningRate 0.0906 Epoch: 1 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:17,540-Speed 10785.84 samples/sec Loss 11.8374 LearningRate 0.0906 Epoch: 1 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:18,472-Speed 10989.84 samples/sec Loss 11.9167 LearningRate 0.0906 Epoch: 1 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:19,412-Speed 10911.31 samples/sec Loss 11.8511 LearningRate 0.0906 Epoch: 1 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:20,345-Speed 10977.72 samples/sec Loss 11.8579 LearningRate 0.0906 Epoch: 1 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:21,261-Speed 11181.50 samples/sec Loss 12.0309 LearningRate 0.0906 Epoch: 1 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:22,184-Speed 11105.30 samples/sec Loss 11.8928 LearningRate 0.0906 Epoch: 1 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:23,092-Speed 11289.84 samples/sec Loss 11.9698 LearningRate 0.0905 Epoch: 1 Global Step: 11020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:23,997-Speed 11327.98 samples/sec Loss 11.7285 LearningRate 0.0905 Epoch: 1 Global Step: 11030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:24,905-Speed 11276.18 samples/sec Loss 11.8446 LearningRate 0.0905 Epoch: 1 Global Step: 11040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:25,800-Speed 11457.37 samples/sec Loss 11.9016 LearningRate 0.0905 Epoch: 1 Global Step: 11050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:26,725-Speed 11078.12 samples/sec Loss 11.9080 LearningRate 0.0905 Epoch: 1 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:27,643-Speed 11156.63 samples/sec Loss 11.8482 LearningRate 0.0905 Epoch: 1 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:28,555-Speed 11241.01 samples/sec Loss 11.7384 LearningRate 0.0905 Epoch: 1 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:29,454-Speed 11396.25 samples/sec Loss 11.8649 LearningRate 0.0905 Epoch: 1 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:30,384-Speed 11012.85 samples/sec Loss 11.7465 LearningRate 0.0905 Epoch: 1 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:31,316-Speed 10995.10 samples/sec Loss 11.8090 LearningRate 0.0905 Epoch: 1 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:32,298-Speed 10440.80 samples/sec Loss 11.7488 LearningRate 0.0905 Epoch: 1 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:33,240-Speed 10885.16 samples/sec Loss 11.9100 LearningRate 0.0905 Epoch: 1 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:34,129-Speed 11515.52 samples/sec Loss 11.8339 LearningRate 0.0904 Epoch: 1 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:35,069-Speed 10909.99 samples/sec Loss 11.8755 LearningRate 0.0904 Epoch: 1 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:36,001-Speed 10992.59 samples/sec Loss 11.8408 LearningRate 0.0904 Epoch: 1 Global Step: 11160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:38:36,930-Speed 11029.50 samples/sec Loss 12.0316 LearningRate 0.0904 Epoch: 1 Global Step: 11170 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:38:37,866-Speed 10944.50 samples/sec Loss 11.9935 LearningRate 0.0904 Epoch: 1 Global Step: 11180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:38,771-Speed 11339.69 samples/sec Loss 11.9290 LearningRate 0.0904 Epoch: 1 Global Step: 11190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:39,723-Speed 10767.92 samples/sec Loss 11.9661 LearningRate 0.0904 Epoch: 1 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:40,636-Speed 11219.60 samples/sec Loss 11.8011 LearningRate 0.0904 Epoch: 1 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:41,575-Speed 10915.58 samples/sec Loss 11.7575 LearningRate 0.0904 Epoch: 1 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:42,513-Speed 10927.11 samples/sec Loss 11.6779 LearningRate 0.0904 Epoch: 1 Global Step: 11230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:43,429-Speed 11179.30 samples/sec Loss 11.7485 LearningRate 0.0904 Epoch: 1 Global Step: 11240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:44,332-Speed 11351.74 samples/sec Loss 11.8353 LearningRate 0.0904 Epoch: 1 Global Step: 11250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:45,233-Speed 11375.18 samples/sec Loss 11.6576 LearningRate 0.0903 Epoch: 1 Global Step: 11260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:46,135-Speed 11361.17 samples/sec Loss 11.8852 LearningRate 0.0903 Epoch: 1 Global Step: 11270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:38:47,041-Speed 11303.31 samples/sec Loss 11.8974 LearningRate 0.0903 Epoch: 1 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:47,943-Speed 11362.73 samples/sec Loss 11.7706 LearningRate 0.0903 Epoch: 1 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:48,837-Speed 11465.40 samples/sec Loss 11.7764 LearningRate 0.0903 Epoch: 1 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:49,783-Speed 10836.25 samples/sec Loss 11.8277 LearningRate 0.0903 Epoch: 1 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:50,714-Speed 11003.12 samples/sec Loss 11.7710 LearningRate 0.0903 Epoch: 1 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:38:51,626-Speed 11232.84 samples/sec Loss 11.8100 LearningRate 0.0903 Epoch: 1 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:38:52,533-Speed 11295.80 samples/sec Loss 11.6784 LearningRate 0.0903 Epoch: 1 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:38:53,439-Speed 11316.52 samples/sec Loss 11.7962 LearningRate 0.0903 Epoch: 1 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:38:54,366-Speed 11058.19 samples/sec Loss 11.9229 LearningRate 0.0903 Epoch: 1 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:38:55,316-Speed 10782.38 samples/sec Loss 11.7152 LearningRate 0.0903 Epoch: 1 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:05,868-Speed 970.54 samples/sec Loss 11.1989 LearningRate 0.0902 Epoch: 2 Global Step: 11380 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:39:06,845-Speed 10499.99 samples/sec Loss 11.0492 LearningRate 0.0902 Epoch: 2 Global Step: 11390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:39:07,781-Speed 10945.72 samples/sec Loss 11.0929 LearningRate 0.0902 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:39:08,708-Speed 11053.13 samples/sec Loss 10.8850 LearningRate 0.0902 Epoch: 2 Global Step: 11410 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:39:09,618-Speed 11265.11 samples/sec Loss 10.9267 LearningRate 0.0902 Epoch: 2 Global Step: 11420 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:39:10,728-Speed 9227.54 samples/sec Loss 10.9442 LearningRate 0.0902 Epoch: 2 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:11,640-Speed 11236.23 samples/sec Loss 11.0585 LearningRate 0.0902 Epoch: 2 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:12,576-Speed 10950.50 samples/sec Loss 10.8798 LearningRate 0.0902 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:13,507-Speed 11003.72 samples/sec Loss 10.8950 LearningRate 0.0902 Epoch: 2 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:14,388-Speed 11629.56 samples/sec Loss 11.0556 LearningRate 0.0902 Epoch: 2 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:15,282-Speed 11470.48 samples/sec Loss 11.0706 LearningRate 0.0902 Epoch: 2 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:16,179-Speed 11417.98 samples/sec Loss 11.2699 LearningRate 0.0902 Epoch: 2 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:17,084-Speed 11321.37 samples/sec Loss 10.9889 LearningRate 0.0901 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:17,994-Speed 11263.93 samples/sec Loss 11.0868 LearningRate 0.0901 Epoch: 2 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:18,899-Speed 11322.53 samples/sec Loss 11.0705 LearningRate 0.0901 Epoch: 2 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:19,794-Speed 11455.16 samples/sec Loss 10.9327 LearningRate 0.0901 Epoch: 2 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:39:20,692-Speed 11407.92 samples/sec Loss 11.0956 LearningRate 0.0901 Epoch: 2 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:21,601-Speed 11273.02 samples/sec Loss 11.1025 LearningRate 0.0901 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:22,519-Speed 11157.19 samples/sec Loss 11.2772 LearningRate 0.0901 Epoch: 2 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:39:23,413-Speed 11460.16 samples/sec Loss 11.1568 LearningRate 0.0901 Epoch: 2 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:24,331-Speed 11163.57 samples/sec Loss 11.0309 LearningRate 0.0901 Epoch: 2 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:25,229-Speed 11420.09 samples/sec Loss 11.1965 LearningRate 0.0901 Epoch: 2 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:26,125-Speed 11428.79 samples/sec Loss 11.2547 LearningRate 0.0901 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:27,062-Speed 10944.70 samples/sec Loss 11.2440 LearningRate 0.0901 Epoch: 2 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:27,976-Speed 11208.69 samples/sec Loss 10.9129 LearningRate 0.0900 Epoch: 2 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:28,900-Speed 11094.92 samples/sec Loss 11.0736 LearningRate 0.0900 Epoch: 2 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:29,817-Speed 11169.43 samples/sec Loss 11.0937 LearningRate 0.0900 Epoch: 2 Global Step: 11640 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:39:30,730-Speed 11227.07 samples/sec Loss 11.1745 LearningRate 0.0900 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:39:31,687-Speed 10711.19 samples/sec Loss 11.1712 LearningRate 0.0900 Epoch: 2 Global Step: 11660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:39:32,599-Speed 11236.49 samples/sec Loss 11.0879 LearningRate 0.0900 Epoch: 2 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:33,496-Speed 11416.91 samples/sec Loss 11.0902 LearningRate 0.0900 Epoch: 2 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:34,378-Speed 11622.30 samples/sec Loss 11.0730 LearningRate 0.0900 Epoch: 2 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:35,268-Speed 11509.43 samples/sec Loss 10.9919 LearningRate 0.0900 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:36,162-Speed 11470.22 samples/sec Loss 11.1817 LearningRate 0.0900 Epoch: 2 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:37,063-Speed 11370.63 samples/sec Loss 11.1002 LearningRate 0.0900 Epoch: 2 Global Step: 11720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:38,010-Speed 10816.81 samples/sec Loss 11.2039 LearningRate 0.0900 Epoch: 2 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:38,942-Speed 10996.50 samples/sec Loss 11.4704 LearningRate 0.0899 Epoch: 2 Global Step: 11740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:39,891-Speed 10796.48 samples/sec Loss 11.4060 LearningRate 0.0899 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:40,818-Speed 11050.51 samples/sec Loss 11.3059 LearningRate 0.0899 Epoch: 2 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:41,719-Speed 11378.95 samples/sec Loss 11.2572 LearningRate 0.0899 Epoch: 2 Global Step: 11770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:42,654-Speed 10956.15 samples/sec Loss 11.4111 LearningRate 0.0899 Epoch: 2 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:43,569-Speed 11200.71 samples/sec Loss 11.2873 LearningRate 0.0899 Epoch: 2 Global Step: 11790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:44,492-Speed 11103.47 samples/sec Loss 11.1782 LearningRate 0.0899 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:45,429-Speed 10934.45 samples/sec Loss 11.1275 LearningRate 0.0899 Epoch: 2 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:46,321-Speed 11485.34 samples/sec Loss 11.3380 LearningRate 0.0899 Epoch: 2 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:47,215-Speed 11468.32 samples/sec Loss 11.0933 LearningRate 0.0899 Epoch: 2 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:48,171-Speed 10715.55 samples/sec Loss 11.2241 LearningRate 0.0899 Epoch: 2 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:49,085-Speed 11213.47 samples/sec Loss 11.2045 LearningRate 0.0899 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:50,028-Speed 10867.03 samples/sec Loss 11.3397 LearningRate 0.0898 Epoch: 2 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:50,957-Speed 11026.55 samples/sec Loss 11.3094 LearningRate 0.0898 Epoch: 2 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:51,885-Speed 11042.78 samples/sec Loss 11.2014 LearningRate 0.0898 Epoch: 2 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:39:52,833-Speed 10814.94 samples/sec Loss 11.0097 LearningRate 0.0898 Epoch: 2 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:53,734-Speed 11367.12 samples/sec Loss 11.2876 LearningRate 0.0898 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:54,700-Speed 10614.63 samples/sec Loss 11.2404 LearningRate 0.0898 Epoch: 2 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:55,624-Speed 11083.89 samples/sec Loss 11.1718 LearningRate 0.0898 Epoch: 2 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:56,554-Speed 11016.35 samples/sec Loss 11.2595 LearningRate 0.0898 Epoch: 2 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:57,451-Speed 11425.61 samples/sec Loss 11.3587 LearningRate 0.0898 Epoch: 2 Global Step: 11940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:58,340-Speed 11526.45 samples/sec Loss 11.3529 LearningRate 0.0898 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:39:59,223-Speed 11604.83 samples/sec Loss 11.3971 LearningRate 0.0898 Epoch: 2 Global Step: 11960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:40:00,150-Speed 11057.55 samples/sec Loss 11.3916 LearningRate 0.0898 Epoch: 2 Global Step: 11970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:40:01,044-Speed 11457.74 samples/sec Loss 11.2342 LearningRate 0.0897 Epoch: 2 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:40:01,985-Speed 10892.21 samples/sec Loss 11.0811 LearningRate 0.0897 Epoch: 2 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:40:02,892-Speed 11294.46 samples/sec Loss 11.1236 LearningRate 0.0897 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:40:23,998-[lfw][12000]XNorm: 15.242508 Training: 2022-04-10 23:40:23,998-[lfw][12000]Accuracy-Flip: 0.99300+-0.00488 Training: 2022-04-10 23:40:23,998-[lfw][12000]Accuracy-Highest: 0.99300 Training: 2022-04-10 23:40:48,473-[cfp_fp][12000]XNorm: 12.808779 Training: 2022-04-10 23:40:48,474-[cfp_fp][12000]Accuracy-Flip: 0.89757+-0.01619 Training: 2022-04-10 23:40:48,474-[cfp_fp][12000]Accuracy-Highest: 0.89757 Training: 2022-04-10 23:41:09,627-[agedb_30][12000]XNorm: 14.941259 Training: 2022-04-10 23:41:09,628-[agedb_30][12000]Accuracy-Flip: 0.93617+-0.01670 Training: 2022-04-10 23:41:09,628-[agedb_30][12000]Accuracy-Highest: 0.93617 Training: 2022-04-10 23:41:10,542-Speed 151.37 samples/sec Loss 11.3598 LearningRate 0.0897 Epoch: 2 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:11,447-Speed 11325.06 samples/sec Loss 11.1682 LearningRate 0.0897 Epoch: 2 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:12,373-Speed 11072.89 samples/sec Loss 11.3399 LearningRate 0.0897 Epoch: 2 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:13,308-Speed 10952.18 samples/sec Loss 11.2477 LearningRate 0.0897 Epoch: 2 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:14,231-Speed 11106.18 samples/sec Loss 11.1735 LearningRate 0.0897 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:15,165-Speed 10973.60 samples/sec Loss 11.1900 LearningRate 0.0897 Epoch: 2 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:16,182-Speed 10070.91 samples/sec Loss 11.2318 LearningRate 0.0897 Epoch: 2 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:17,091-Speed 11279.98 samples/sec Loss 11.2580 LearningRate 0.0897 Epoch: 2 Global Step: 12080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:17,996-Speed 11321.24 samples/sec Loss 11.3271 LearningRate 0.0897 Epoch: 2 Global Step: 12090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:18,908-Speed 11247.67 samples/sec Loss 11.2134 LearningRate 0.0896 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:19,844-Speed 10941.64 samples/sec Loss 11.3310 LearningRate 0.0896 Epoch: 2 Global Step: 12110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:20,766-Speed 11117.76 samples/sec Loss 11.3942 LearningRate 0.0896 Epoch: 2 Global Step: 12120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:21,866-Speed 9315.55 samples/sec Loss 11.2594 LearningRate 0.0896 Epoch: 2 Global Step: 12130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:22,773-Speed 11299.54 samples/sec Loss 11.3061 LearningRate 0.0896 Epoch: 2 Global Step: 12140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:23,676-Speed 11338.26 samples/sec Loss 11.3005 LearningRate 0.0896 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:24,620-Speed 10865.00 samples/sec Loss 11.4370 LearningRate 0.0896 Epoch: 2 Global Step: 12160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:25,549-Speed 11020.43 samples/sec Loss 11.3594 LearningRate 0.0896 Epoch: 2 Global Step: 12170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:41:26,487-Speed 10937.14 samples/sec Loss 11.3918 LearningRate 0.0896 Epoch: 2 Global Step: 12180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:27,401-Speed 11207.30 samples/sec Loss 11.2684 LearningRate 0.0896 Epoch: 2 Global Step: 12190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:28,352-Speed 10769.68 samples/sec Loss 11.2709 LearningRate 0.0896 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:29,262-Speed 11260.59 samples/sec Loss 11.3073 LearningRate 0.0896 Epoch: 2 Global Step: 12210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:30,163-Speed 11382.06 samples/sec Loss 11.2514 LearningRate 0.0895 Epoch: 2 Global Step: 12220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:31,130-Speed 10594.52 samples/sec Loss 11.4156 LearningRate 0.0895 Epoch: 2 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:32,063-Speed 10975.43 samples/sec Loss 11.3251 LearningRate 0.0895 Epoch: 2 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:32,968-Speed 11337.28 samples/sec Loss 11.4878 LearningRate 0.0895 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:33,906-Speed 10918.54 samples/sec Loss 11.2513 LearningRate 0.0895 Epoch: 2 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:34,787-Speed 11633.34 samples/sec Loss 11.3463 LearningRate 0.0895 Epoch: 2 Global Step: 12270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:35,702-Speed 11199.75 samples/sec Loss 11.1572 LearningRate 0.0895 Epoch: 2 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:41:36,627-Speed 11071.82 samples/sec Loss 11.3798 LearningRate 0.0895 Epoch: 2 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:37,512-Speed 11581.64 samples/sec Loss 11.2318 LearningRate 0.0895 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:38,397-Speed 11584.09 samples/sec Loss 10.9910 LearningRate 0.0895 Epoch: 2 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:39,291-Speed 11456.83 samples/sec Loss 11.3562 LearningRate 0.0895 Epoch: 2 Global Step: 12320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:40,224-Speed 10985.46 samples/sec Loss 11.2759 LearningRate 0.0895 Epoch: 2 Global Step: 12330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:41,139-Speed 11200.27 samples/sec Loss 11.4041 LearningRate 0.0894 Epoch: 2 Global Step: 12340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:42,058-Speed 11153.28 samples/sec Loss 11.2055 LearningRate 0.0894 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:42,965-Speed 11295.77 samples/sec Loss 11.4231 LearningRate 0.0894 Epoch: 2 Global Step: 12360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:43,870-Speed 11322.93 samples/sec Loss 11.3691 LearningRate 0.0894 Epoch: 2 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:44,753-Speed 11597.60 samples/sec Loss 11.3636 LearningRate 0.0894 Epoch: 2 Global Step: 12380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:45,647-Speed 11482.61 samples/sec Loss 11.3124 LearningRate 0.0894 Epoch: 2 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:41:46,552-Speed 11325.10 samples/sec Loss 11.4520 LearningRate 0.0894 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:47,459-Speed 11289.98 samples/sec Loss 11.3773 LearningRate 0.0894 Epoch: 2 Global Step: 12410 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:48,394-Speed 10960.53 samples/sec Loss 11.2961 LearningRate 0.0894 Epoch: 2 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:49,322-Speed 11047.42 samples/sec Loss 11.0926 LearningRate 0.0894 Epoch: 2 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:50,241-Speed 11152.22 samples/sec Loss 11.3814 LearningRate 0.0894 Epoch: 2 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:51,196-Speed 10731.47 samples/sec Loss 11.3605 LearningRate 0.0894 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:52,123-Speed 11058.97 samples/sec Loss 11.2741 LearningRate 0.0893 Epoch: 2 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:53,019-Speed 11425.80 samples/sec Loss 11.4200 LearningRate 0.0893 Epoch: 2 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:53,955-Speed 10953.91 samples/sec Loss 11.4201 LearningRate 0.0893 Epoch: 2 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:54,889-Speed 10965.38 samples/sec Loss 11.1428 LearningRate 0.0893 Epoch: 2 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:41:55,768-Speed 11656.09 samples/sec Loss 11.3189 LearningRate 0.0893 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:41:56,681-Speed 11231.64 samples/sec Loss 11.2286 LearningRate 0.0893 Epoch: 2 Global Step: 12510 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:41:57,595-Speed 11206.21 samples/sec Loss 11.2993 LearningRate 0.0893 Epoch: 2 Global Step: 12520 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:41:58,491-Speed 11434.50 samples/sec Loss 11.2545 LearningRate 0.0893 Epoch: 2 Global Step: 12530 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:41:59,434-Speed 10864.43 samples/sec Loss 11.3622 LearningRate 0.0893 Epoch: 2 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:00,332-Speed 11416.08 samples/sec Loss 11.1863 LearningRate 0.0893 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:01,254-Speed 11111.61 samples/sec Loss 11.2608 LearningRate 0.0893 Epoch: 2 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:02,172-Speed 11165.66 samples/sec Loss 11.1064 LearningRate 0.0893 Epoch: 2 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:03,074-Speed 11364.89 samples/sec Loss 11.2283 LearningRate 0.0892 Epoch: 2 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:04,016-Speed 10875.58 samples/sec Loss 11.4766 LearningRate 0.0892 Epoch: 2 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:04,930-Speed 11206.00 samples/sec Loss 11.3163 LearningRate 0.0892 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:05,832-Speed 11361.19 samples/sec Loss 11.2928 LearningRate 0.0892 Epoch: 2 Global Step: 12610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:06,783-Speed 10775.45 samples/sec Loss 11.2933 LearningRate 0.0892 Epoch: 2 Global Step: 12620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:07,718-Speed 10956.12 samples/sec Loss 11.4362 LearningRate 0.0892 Epoch: 2 Global Step: 12630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:08,655-Speed 10943.29 samples/sec Loss 11.3094 LearningRate 0.0892 Epoch: 2 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:42:09,566-Speed 11243.06 samples/sec Loss 11.0958 LearningRate 0.0892 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:42:10,524-Speed 10691.82 samples/sec Loss 11.4652 LearningRate 0.0892 Epoch: 2 Global Step: 12660 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:42:11,436-Speed 11246.60 samples/sec Loss 11.3092 LearningRate 0.0892 Epoch: 2 Global Step: 12670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:12,350-Speed 11208.55 samples/sec Loss 11.3279 LearningRate 0.0892 Epoch: 2 Global Step: 12680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:13,274-Speed 11093.23 samples/sec Loss 11.1882 LearningRate 0.0892 Epoch: 2 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:14,180-Speed 11315.18 samples/sec Loss 11.3041 LearningRate 0.0891 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:15,132-Speed 10765.01 samples/sec Loss 11.2053 LearningRate 0.0891 Epoch: 2 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:16,065-Speed 10979.60 samples/sec Loss 11.2532 LearningRate 0.0891 Epoch: 2 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:17,003-Speed 10926.17 samples/sec Loss 11.1926 LearningRate 0.0891 Epoch: 2 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:17,916-Speed 11228.55 samples/sec Loss 11.3561 LearningRate 0.0891 Epoch: 2 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:18,824-Speed 11273.68 samples/sec Loss 11.2345 LearningRate 0.0891 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:19,763-Speed 10918.97 samples/sec Loss 11.2186 LearningRate 0.0891 Epoch: 2 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:20,651-Speed 11544.70 samples/sec Loss 11.3430 LearningRate 0.0891 Epoch: 2 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:21,534-Speed 11616.15 samples/sec Loss 11.1422 LearningRate 0.0891 Epoch: 2 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:22,470-Speed 10941.38 samples/sec Loss 11.2935 LearningRate 0.0891 Epoch: 2 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:23,394-Speed 11083.60 samples/sec Loss 11.3571 LearningRate 0.0891 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:24,322-Speed 11043.39 samples/sec Loss 11.0763 LearningRate 0.0891 Epoch: 2 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:25,233-Speed 11254.93 samples/sec Loss 11.2280 LearningRate 0.0890 Epoch: 2 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:26,134-Speed 11373.93 samples/sec Loss 11.3628 LearningRate 0.0890 Epoch: 2 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:27,040-Speed 11316.46 samples/sec Loss 11.2456 LearningRate 0.0890 Epoch: 2 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:27,953-Speed 11220.36 samples/sec Loss 11.3420 LearningRate 0.0890 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:28,892-Speed 10909.09 samples/sec Loss 11.2405 LearningRate 0.0890 Epoch: 2 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:29,788-Speed 11440.80 samples/sec Loss 11.4527 LearningRate 0.0890 Epoch: 2 Global Step: 12870 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:42:30,674-Speed 11569.27 samples/sec Loss 11.1377 LearningRate 0.0890 Epoch: 2 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:31,618-Speed 10854.93 samples/sec Loss 11.2651 LearningRate 0.0890 Epoch: 2 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:32,554-Speed 10950.26 samples/sec Loss 11.3799 LearningRate 0.0890 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:33,478-Speed 11079.87 samples/sec Loss 11.3240 LearningRate 0.0890 Epoch: 2 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:34,446-Speed 10592.32 samples/sec Loss 11.2368 LearningRate 0.0890 Epoch: 2 Global Step: 12920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:35,387-Speed 10890.20 samples/sec Loss 11.2914 LearningRate 0.0890 Epoch: 2 Global Step: 12930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:36,322-Speed 10952.37 samples/sec Loss 11.4268 LearningRate 0.0889 Epoch: 2 Global Step: 12940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:37,271-Speed 10797.24 samples/sec Loss 11.3107 LearningRate 0.0889 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:38,182-Speed 11249.20 samples/sec Loss 11.2025 LearningRate 0.0889 Epoch: 2 Global Step: 12960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:39,123-Speed 10894.91 samples/sec Loss 11.2237 LearningRate 0.0889 Epoch: 2 Global Step: 12970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:40,055-Speed 10995.24 samples/sec Loss 11.1481 LearningRate 0.0889 Epoch: 2 Global Step: 12980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:41,009-Speed 10747.72 samples/sec Loss 11.4364 LearningRate 0.0889 Epoch: 2 Global Step: 12990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:41,940-Speed 10995.70 samples/sec Loss 11.2892 LearningRate 0.0889 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:42,894-Speed 10744.28 samples/sec Loss 11.3655 LearningRate 0.0889 Epoch: 2 Global Step: 13010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:43,819-Speed 11076.06 samples/sec Loss 11.3554 LearningRate 0.0889 Epoch: 2 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:44,723-Speed 11348.06 samples/sec Loss 11.3784 LearningRate 0.0889 Epoch: 2 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:45,622-Speed 11404.23 samples/sec Loss 11.1278 LearningRate 0.0889 Epoch: 2 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:46,495-Speed 11738.83 samples/sec Loss 11.3585 LearningRate 0.0889 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:47,408-Speed 11215.76 samples/sec Loss 11.1715 LearningRate 0.0888 Epoch: 2 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:48,338-Speed 11024.59 samples/sec Loss 11.2455 LearningRate 0.0888 Epoch: 2 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:49,270-Speed 10990.73 samples/sec Loss 11.2972 LearningRate 0.0888 Epoch: 2 Global Step: 13080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:50,156-Speed 11572.91 samples/sec Loss 11.2368 LearningRate 0.0888 Epoch: 2 Global Step: 13090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:51,037-Speed 11630.79 samples/sec Loss 11.3999 LearningRate 0.0888 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:51,952-Speed 11199.33 samples/sec Loss 11.2931 LearningRate 0.0888 Epoch: 2 Global Step: 13110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:52,885-Speed 10983.69 samples/sec Loss 11.3515 LearningRate 0.0888 Epoch: 2 Global Step: 13120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:42:53,800-Speed 11193.02 samples/sec Loss 11.1710 LearningRate 0.0888 Epoch: 2 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:54,688-Speed 11539.90 samples/sec Loss 11.2444 LearningRate 0.0888 Epoch: 2 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:55,628-Speed 10906.45 samples/sec Loss 11.1603 LearningRate 0.0888 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:56,536-Speed 11292.93 samples/sec Loss 11.2539 LearningRate 0.0888 Epoch: 2 Global Step: 13160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:57,428-Speed 11480.35 samples/sec Loss 11.3492 LearningRate 0.0888 Epoch: 2 Global Step: 13170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:58,331-Speed 11354.69 samples/sec Loss 11.2265 LearningRate 0.0887 Epoch: 2 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:42:59,224-Speed 11469.85 samples/sec Loss 11.4084 LearningRate 0.0887 Epoch: 2 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:43:00,129-Speed 11314.83 samples/sec Loss 11.0750 LearningRate 0.0887 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:43:01,095-Speed 10612.56 samples/sec Loss 11.1755 LearningRate 0.0887 Epoch: 2 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:02,041-Speed 10829.02 samples/sec Loss 11.2428 LearningRate 0.0887 Epoch: 2 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:02,951-Speed 11266.16 samples/sec Loss 11.2660 LearningRate 0.0887 Epoch: 2 Global Step: 13230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:03,875-Speed 11095.50 samples/sec Loss 10.9527 LearningRate 0.0887 Epoch: 2 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:04,817-Speed 10871.73 samples/sec Loss 11.2799 LearningRate 0.0887 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:05,718-Speed 11370.22 samples/sec Loss 11.1826 LearningRate 0.0887 Epoch: 2 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:06,659-Speed 10903.15 samples/sec Loss 11.3055 LearningRate 0.0887 Epoch: 2 Global Step: 13270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:07,586-Speed 11059.11 samples/sec Loss 11.2210 LearningRate 0.0887 Epoch: 2 Global Step: 13280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:08,492-Speed 11304.64 samples/sec Loss 11.0614 LearningRate 0.0887 Epoch: 2 Global Step: 13290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:09,403-Speed 11257.44 samples/sec Loss 11.0500 LearningRate 0.0886 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:10,291-Speed 11544.01 samples/sec Loss 11.2474 LearningRate 0.0886 Epoch: 2 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:11,197-Speed 11319.77 samples/sec Loss 11.0705 LearningRate 0.0886 Epoch: 2 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:12,105-Speed 11281.19 samples/sec Loss 11.1974 LearningRate 0.0886 Epoch: 2 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:13,010-Speed 11322.75 samples/sec Loss 11.1264 LearningRate 0.0886 Epoch: 2 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:13,922-Speed 11229.59 samples/sec Loss 11.2270 LearningRate 0.0886 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:14,812-Speed 11515.16 samples/sec Loss 11.1946 LearningRate 0.0886 Epoch: 2 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:15,695-Speed 11618.59 samples/sec Loss 11.1824 LearningRate 0.0886 Epoch: 2 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:16,573-Speed 11666.48 samples/sec Loss 11.1961 LearningRate 0.0886 Epoch: 2 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:17,496-Speed 11106.62 samples/sec Loss 11.0777 LearningRate 0.0886 Epoch: 2 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:18,443-Speed 10816.15 samples/sec Loss 11.2157 LearningRate 0.0886 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:19,365-Speed 11116.25 samples/sec Loss 11.1244 LearningRate 0.0886 Epoch: 2 Global Step: 13410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:20,294-Speed 11023.48 samples/sec Loss 11.1499 LearningRate 0.0885 Epoch: 2 Global Step: 13420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:21,213-Speed 11149.10 samples/sec Loss 11.0716 LearningRate 0.0885 Epoch: 2 Global Step: 13430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:22,157-Speed 10855.32 samples/sec Loss 10.9416 LearningRate 0.0885 Epoch: 2 Global Step: 13440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:23,075-Speed 11160.19 samples/sec Loss 11.3126 LearningRate 0.0885 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:23,992-Speed 11181.04 samples/sec Loss 11.1553 LearningRate 0.0885 Epoch: 2 Global Step: 13460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:24,940-Speed 10803.03 samples/sec Loss 11.0678 LearningRate 0.0885 Epoch: 2 Global Step: 13470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:25,831-Speed 11500.51 samples/sec Loss 11.3202 LearningRate 0.0885 Epoch: 2 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:26,741-Speed 11266.78 samples/sec Loss 11.0953 LearningRate 0.0885 Epoch: 2 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:27,691-Speed 10783.59 samples/sec Loss 11.0874 LearningRate 0.0885 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:28,596-Speed 11331.81 samples/sec Loss 11.2064 LearningRate 0.0885 Epoch: 2 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:29,512-Speed 11188.07 samples/sec Loss 11.2815 LearningRate 0.0885 Epoch: 2 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:30,432-Speed 11138.57 samples/sec Loss 11.1610 LearningRate 0.0885 Epoch: 2 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:31,361-Speed 11029.60 samples/sec Loss 11.4281 LearningRate 0.0884 Epoch: 2 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:32,278-Speed 11172.39 samples/sec Loss 11.0345 LearningRate 0.0884 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:33,183-Speed 11324.03 samples/sec Loss 11.1866 LearningRate 0.0884 Epoch: 2 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:34,074-Speed 11496.28 samples/sec Loss 11.1530 LearningRate 0.0884 Epoch: 2 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:34,964-Speed 11523.72 samples/sec Loss 11.2276 LearningRate 0.0884 Epoch: 2 Global Step: 13580 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:43:35,874-Speed 11261.15 samples/sec Loss 11.2304 LearningRate 0.0884 Epoch: 2 Global Step: 13590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:43:36,790-Speed 11177.35 samples/sec Loss 11.0901 LearningRate 0.0884 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:37,706-Speed 11186.88 samples/sec Loss 11.0964 LearningRate 0.0884 Epoch: 2 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:38,621-Speed 11201.42 samples/sec Loss 11.1788 LearningRate 0.0884 Epoch: 2 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:39,558-Speed 10933.07 samples/sec Loss 11.1060 LearningRate 0.0884 Epoch: 2 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:40,479-Speed 11133.33 samples/sec Loss 11.0251 LearningRate 0.0884 Epoch: 2 Global Step: 13640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:41,422-Speed 10862.95 samples/sec Loss 11.1989 LearningRate 0.0884 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:42,357-Speed 10963.24 samples/sec Loss 11.1611 LearningRate 0.0883 Epoch: 2 Global Step: 13660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:43,265-Speed 11282.63 samples/sec Loss 11.1177 LearningRate 0.0883 Epoch: 2 Global Step: 13670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:44,238-Speed 10524.60 samples/sec Loss 11.1639 LearningRate 0.0883 Epoch: 2 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:45,131-Speed 11473.86 samples/sec Loss 11.0904 LearningRate 0.0883 Epoch: 2 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:46,018-Speed 11563.10 samples/sec Loss 10.8472 LearningRate 0.0883 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:46,920-Speed 11351.34 samples/sec Loss 11.1328 LearningRate 0.0883 Epoch: 2 Global Step: 13710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:47,864-Speed 10862.43 samples/sec Loss 11.0809 LearningRate 0.0883 Epoch: 2 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:48,757-Speed 11470.03 samples/sec Loss 11.0906 LearningRate 0.0883 Epoch: 2 Global Step: 13730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:43:49,680-Speed 11122.79 samples/sec Loss 11.1780 LearningRate 0.0883 Epoch: 2 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:50,636-Speed 10718.03 samples/sec Loss 11.0461 LearningRate 0.0883 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:51,577-Speed 10888.70 samples/sec Loss 11.1683 LearningRate 0.0883 Epoch: 2 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:52,504-Speed 11049.66 samples/sec Loss 11.0558 LearningRate 0.0883 Epoch: 2 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:53,441-Speed 10941.98 samples/sec Loss 11.0253 LearningRate 0.0882 Epoch: 2 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:54,370-Speed 11032.80 samples/sec Loss 11.1990 LearningRate 0.0882 Epoch: 2 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:55,299-Speed 11025.53 samples/sec Loss 11.1213 LearningRate 0.0882 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:56,191-Speed 11488.02 samples/sec Loss 11.0515 LearningRate 0.0882 Epoch: 2 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:57,066-Speed 11711.11 samples/sec Loss 11.1830 LearningRate 0.0882 Epoch: 2 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:57,957-Speed 11502.09 samples/sec Loss 10.9431 LearningRate 0.0882 Epoch: 2 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:58,855-Speed 11410.38 samples/sec Loss 11.0180 LearningRate 0.0882 Epoch: 2 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:43:59,765-Speed 11261.13 samples/sec Loss 11.1274 LearningRate 0.0882 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:00,677-Speed 11234.14 samples/sec Loss 11.2638 LearningRate 0.0882 Epoch: 2 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:01,591-Speed 11217.98 samples/sec Loss 11.0485 LearningRate 0.0882 Epoch: 2 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:02,557-Speed 10623.73 samples/sec Loss 11.1087 LearningRate 0.0882 Epoch: 2 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:03,483-Speed 11062.60 samples/sec Loss 10.9969 LearningRate 0.0882 Epoch: 2 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:04,413-Speed 11013.82 samples/sec Loss 11.0726 LearningRate 0.0882 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:05,345-Speed 10995.99 samples/sec Loss 11.1082 LearningRate 0.0881 Epoch: 2 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:06,266-Speed 11125.56 samples/sec Loss 11.1116 LearningRate 0.0881 Epoch: 2 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:44:07,172-Speed 11310.86 samples/sec Loss 11.1723 LearningRate 0.0881 Epoch: 2 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:08,096-Speed 11088.99 samples/sec Loss 11.1809 LearningRate 0.0881 Epoch: 2 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:09,004-Speed 11287.10 samples/sec Loss 10.9866 LearningRate 0.0881 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:09,883-Speed 11656.34 samples/sec Loss 11.1255 LearningRate 0.0881 Epoch: 2 Global Step: 13960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:10,783-Speed 11378.50 samples/sec Loss 11.0664 LearningRate 0.0881 Epoch: 2 Global Step: 13970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:11,713-Speed 11027.67 samples/sec Loss 11.0818 LearningRate 0.0881 Epoch: 2 Global Step: 13980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:12,656-Speed 10865.44 samples/sec Loss 11.1744 LearningRate 0.0881 Epoch: 2 Global Step: 13990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:13,547-Speed 11502.75 samples/sec Loss 11.1837 LearningRate 0.0881 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:44:34,800-[lfw][14000]XNorm: 15.389441 Training: 2022-04-10 23:44:34,800-[lfw][14000]Accuracy-Flip: 0.99167+-0.00459 Training: 2022-04-10 23:44:34,801-[lfw][14000]Accuracy-Highest: 0.99300 Training: 2022-04-10 23:44:59,323-[cfp_fp][14000]XNorm: 12.899957 Training: 2022-04-10 23:44:59,324-[cfp_fp][14000]Accuracy-Flip: 0.90471+-0.01777 Training: 2022-04-10 23:44:59,324-[cfp_fp][14000]Accuracy-Highest: 0.90471 Training: 2022-04-10 23:45:20,519-[agedb_30][14000]XNorm: 15.062461 Training: 2022-04-10 23:45:20,520-[agedb_30][14000]Accuracy-Flip: 0.93717+-0.01280 Training: 2022-04-10 23:45:20,520-[agedb_30][14000]Accuracy-Highest: 0.93717 Training: 2022-04-10 23:45:21,428-Speed 150.85 samples/sec Loss 11.1799 LearningRate 0.0881 Epoch: 2 Global Step: 14010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:22,329-Speed 11381.08 samples/sec Loss 11.1441 LearningRate 0.0881 Epoch: 2 Global Step: 14020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:23,253-Speed 11082.29 samples/sec Loss 11.0432 LearningRate 0.0880 Epoch: 2 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:24,159-Speed 11319.03 samples/sec Loss 10.9248 LearningRate 0.0880 Epoch: 2 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:25,083-Speed 11095.11 samples/sec Loss 11.1260 LearningRate 0.0880 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:25,978-Speed 11444.05 samples/sec Loss 11.1001 LearningRate 0.0880 Epoch: 2 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:26,905-Speed 11063.88 samples/sec Loss 11.1937 LearningRate 0.0880 Epoch: 2 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:27,846-Speed 10903.62 samples/sec Loss 10.9876 LearningRate 0.0880 Epoch: 2 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:28,773-Speed 11053.64 samples/sec Loss 10.9745 LearningRate 0.0880 Epoch: 2 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:29,697-Speed 11096.22 samples/sec Loss 11.1326 LearningRate 0.0880 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:30,599-Speed 11357.19 samples/sec Loss 10.9371 LearningRate 0.0880 Epoch: 2 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:31,508-Speed 11268.84 samples/sec Loss 11.0142 LearningRate 0.0880 Epoch: 2 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:32,428-Speed 11139.90 samples/sec Loss 11.0625 LearningRate 0.0880 Epoch: 2 Global Step: 14130 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:33,334-Speed 11315.89 samples/sec Loss 11.1596 LearningRate 0.0880 Epoch: 2 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:34,255-Speed 11122.10 samples/sec Loss 11.0736 LearningRate 0.0879 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:35,153-Speed 11415.91 samples/sec Loss 11.1540 LearningRate 0.0879 Epoch: 2 Global Step: 14160 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:36,039-Speed 11566.47 samples/sec Loss 11.0028 LearningRate 0.0879 Epoch: 2 Global Step: 14170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:36,944-Speed 11315.62 samples/sec Loss 11.0716 LearningRate 0.0879 Epoch: 2 Global Step: 14180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:37,838-Speed 11472.58 samples/sec Loss 11.0028 LearningRate 0.0879 Epoch: 2 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:38,754-Speed 11176.63 samples/sec Loss 11.0121 LearningRate 0.0879 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:39,657-Speed 11356.67 samples/sec Loss 11.1466 LearningRate 0.0879 Epoch: 2 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:40,590-Speed 10974.68 samples/sec Loss 11.0061 LearningRate 0.0879 Epoch: 2 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:41,506-Speed 11193.11 samples/sec Loss 10.9966 LearningRate 0.0879 Epoch: 2 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:42,448-Speed 10877.95 samples/sec Loss 11.0102 LearningRate 0.0879 Epoch: 2 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:43,387-Speed 10908.77 samples/sec Loss 11.0109 LearningRate 0.0879 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:44,298-Speed 11253.88 samples/sec Loss 10.9646 LearningRate 0.0879 Epoch: 2 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:45,199-Speed 11377.05 samples/sec Loss 11.0582 LearningRate 0.0878 Epoch: 2 Global Step: 14270 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:46,139-Speed 10898.07 samples/sec Loss 10.9839 LearningRate 0.0878 Epoch: 2 Global Step: 14280 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:47,106-Speed 10604.81 samples/sec Loss 10.8442 LearningRate 0.0878 Epoch: 2 Global Step: 14290 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:48,035-Speed 11027.88 samples/sec Loss 11.0628 LearningRate 0.0878 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:49,003-Speed 10585.50 samples/sec Loss 10.9617 LearningRate 0.0878 Epoch: 2 Global Step: 14310 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:49,967-Speed 10624.57 samples/sec Loss 11.0978 LearningRate 0.0878 Epoch: 2 Global Step: 14320 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:50,897-Speed 11019.22 samples/sec Loss 11.0533 LearningRate 0.0878 Epoch: 2 Global Step: 14330 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:45:51,841-Speed 10854.36 samples/sec Loss 11.1143 LearningRate 0.0878 Epoch: 2 Global Step: 14340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:52,759-Speed 11173.05 samples/sec Loss 10.9212 LearningRate 0.0878 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:45:53,662-Speed 11345.61 samples/sec Loss 11.1063 LearningRate 0.0878 Epoch: 2 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:54,587-Speed 11077.04 samples/sec Loss 11.0822 LearningRate 0.0878 Epoch: 2 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:55,491-Speed 11330.25 samples/sec Loss 11.0087 LearningRate 0.0878 Epoch: 2 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:56,431-Speed 10902.67 samples/sec Loss 10.9230 LearningRate 0.0877 Epoch: 2 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:57,359-Speed 11036.52 samples/sec Loss 10.9372 LearningRate 0.0877 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:58,269-Speed 11265.14 samples/sec Loss 11.0056 LearningRate 0.0877 Epoch: 2 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:45:59,156-Speed 11550.69 samples/sec Loss 11.0790 LearningRate 0.0877 Epoch: 2 Global Step: 14420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:00,050-Speed 11467.43 samples/sec Loss 10.8579 LearningRate 0.0877 Epoch: 2 Global Step: 14430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:00,974-Speed 11092.66 samples/sec Loss 11.0808 LearningRate 0.0877 Epoch: 2 Global Step: 14440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:01,886-Speed 11234.87 samples/sec Loss 11.0254 LearningRate 0.0877 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:02,786-Speed 11385.92 samples/sec Loss 10.9953 LearningRate 0.0877 Epoch: 2 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:03,702-Speed 11188.09 samples/sec Loss 10.9492 LearningRate 0.0877 Epoch: 2 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:04,604-Speed 11357.57 samples/sec Loss 10.9839 LearningRate 0.0877 Epoch: 2 Global Step: 14480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:05,514-Speed 11265.38 samples/sec Loss 10.9326 LearningRate 0.0877 Epoch: 2 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:06,408-Speed 11463.77 samples/sec Loss 10.9246 LearningRate 0.0877 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:07,317-Speed 11262.42 samples/sec Loss 10.9183 LearningRate 0.0876 Epoch: 2 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:08,241-Speed 11100.30 samples/sec Loss 11.0787 LearningRate 0.0876 Epoch: 2 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:09,130-Speed 11518.62 samples/sec Loss 11.1286 LearningRate 0.0876 Epoch: 2 Global Step: 14530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:10,028-Speed 11419.71 samples/sec Loss 10.9936 LearningRate 0.0876 Epoch: 2 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:10,926-Speed 11408.29 samples/sec Loss 10.8886 LearningRate 0.0876 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:11,837-Speed 11251.72 samples/sec Loss 11.1023 LearningRate 0.0876 Epoch: 2 Global Step: 14560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:12,747-Speed 11248.63 samples/sec Loss 11.0930 LearningRate 0.0876 Epoch: 2 Global Step: 14570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:13,710-Speed 10647.63 samples/sec Loss 11.0367 LearningRate 0.0876 Epoch: 2 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:14,593-Speed 11595.83 samples/sec Loss 10.9568 LearningRate 0.0876 Epoch: 2 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:15,523-Speed 11017.56 samples/sec Loss 10.9357 LearningRate 0.0876 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:16,433-Speed 11266.81 samples/sec Loss 10.8543 LearningRate 0.0876 Epoch: 2 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:17,341-Speed 11282.75 samples/sec Loss 11.1308 LearningRate 0.0876 Epoch: 2 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:18,276-Speed 10964.03 samples/sec Loss 10.9363 LearningRate 0.0875 Epoch: 2 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:19,218-Speed 10873.47 samples/sec Loss 10.9778 LearningRate 0.0875 Epoch: 2 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:20,139-Speed 11123.37 samples/sec Loss 11.1261 LearningRate 0.0875 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:21,057-Speed 11170.90 samples/sec Loss 10.8558 LearningRate 0.0875 Epoch: 2 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:21,946-Speed 11526.08 samples/sec Loss 10.9647 LearningRate 0.0875 Epoch: 2 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:22,837-Speed 11526.60 samples/sec Loss 10.7777 LearningRate 0.0875 Epoch: 2 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:46:23,738-Speed 11362.06 samples/sec Loss 10.9944 LearningRate 0.0875 Epoch: 2 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:24,654-Speed 11194.93 samples/sec Loss 10.9203 LearningRate 0.0875 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:25,539-Speed 11577.17 samples/sec Loss 10.9491 LearningRate 0.0875 Epoch: 2 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:26,425-Speed 11565.47 samples/sec Loss 10.9330 LearningRate 0.0875 Epoch: 2 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:27,325-Speed 11378.19 samples/sec Loss 10.8693 LearningRate 0.0875 Epoch: 2 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:28,248-Speed 11104.60 samples/sec Loss 10.9492 LearningRate 0.0875 Epoch: 2 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:29,192-Speed 10859.05 samples/sec Loss 11.1258 LearningRate 0.0875 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:30,108-Speed 11177.20 samples/sec Loss 10.8472 LearningRate 0.0874 Epoch: 2 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:46:30,992-Speed 11598.20 samples/sec Loss 10.8971 LearningRate 0.0874 Epoch: 2 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:46:31,917-Speed 11077.72 samples/sec Loss 11.1447 LearningRate 0.0874 Epoch: 2 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:46:32,843-Speed 11059.51 samples/sec Loss 10.9270 LearningRate 0.0874 Epoch: 2 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:33,762-Speed 11158.83 samples/sec Loss 10.9262 LearningRate 0.0874 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:34,659-Speed 11425.11 samples/sec Loss 10.9750 LearningRate 0.0874 Epoch: 2 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:35,575-Speed 11180.15 samples/sec Loss 10.9152 LearningRate 0.0874 Epoch: 2 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:36,491-Speed 11183.26 samples/sec Loss 10.8154 LearningRate 0.0874 Epoch: 2 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:37,399-Speed 11286.26 samples/sec Loss 10.9061 LearningRate 0.0874 Epoch: 2 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:38,304-Speed 11333.74 samples/sec Loss 10.9102 LearningRate 0.0874 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:39,210-Speed 11310.92 samples/sec Loss 10.8566 LearningRate 0.0874 Epoch: 2 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:40,158-Speed 10810.59 samples/sec Loss 10.9385 LearningRate 0.0874 Epoch: 2 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:41,065-Speed 11287.27 samples/sec Loss 11.0704 LearningRate 0.0873 Epoch: 2 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:42,004-Speed 10921.61 samples/sec Loss 10.8982 LearningRate 0.0873 Epoch: 2 Global Step: 14890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:46:42,912-Speed 11275.85 samples/sec Loss 10.8557 LearningRate 0.0873 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:46:43,851-Speed 10916.06 samples/sec Loss 10.6701 LearningRate 0.0873 Epoch: 2 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:46:44,783-Speed 10995.30 samples/sec Loss 10.9705 LearningRate 0.0873 Epoch: 2 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:45,690-Speed 11295.05 samples/sec Loss 11.0877 LearningRate 0.0873 Epoch: 2 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:46,635-Speed 10844.16 samples/sec Loss 10.9905 LearningRate 0.0873 Epoch: 2 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:47,577-Speed 10884.49 samples/sec Loss 10.9818 LearningRate 0.0873 Epoch: 2 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:48,478-Speed 11374.04 samples/sec Loss 10.7702 LearningRate 0.0873 Epoch: 2 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:49,360-Speed 11610.40 samples/sec Loss 11.1006 LearningRate 0.0873 Epoch: 2 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:50,285-Speed 11083.33 samples/sec Loss 10.8809 LearningRate 0.0873 Epoch: 2 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:51,216-Speed 11010.08 samples/sec Loss 10.8659 LearningRate 0.0873 Epoch: 2 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:52,138-Speed 11112.83 samples/sec Loss 10.8125 LearningRate 0.0872 Epoch: 2 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:53,074-Speed 10948.23 samples/sec Loss 10.9346 LearningRate 0.0872 Epoch: 2 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:53,986-Speed 11233.08 samples/sec Loss 11.0456 LearningRate 0.0872 Epoch: 2 Global Step: 15020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:46:54,913-Speed 11057.66 samples/sec Loss 10.8584 LearningRate 0.0872 Epoch: 2 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:46:55,828-Speed 11196.87 samples/sec Loss 10.7743 LearningRate 0.0872 Epoch: 2 Global Step: 15040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:46:56,698-Speed 11793.76 samples/sec Loss 10.9887 LearningRate 0.0872 Epoch: 2 Global Step: 15050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:46:57,598-Speed 11389.19 samples/sec Loss 10.9236 LearningRate 0.0872 Epoch: 2 Global Step: 15060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:46:58,515-Speed 11170.48 samples/sec Loss 10.9176 LearningRate 0.0872 Epoch: 2 Global Step: 15070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:46:59,446-Speed 11006.89 samples/sec Loss 10.9243 LearningRate 0.0872 Epoch: 2 Global Step: 15080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:00,407-Speed 10660.06 samples/sec Loss 10.9216 LearningRate 0.0872 Epoch: 2 Global Step: 15090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:01,346-Speed 10923.08 samples/sec Loss 10.6452 LearningRate 0.0872 Epoch: 2 Global Step: 15100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:02,248-Speed 11357.95 samples/sec Loss 10.9121 LearningRate 0.0872 Epoch: 2 Global Step: 15110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:03,162-Speed 11215.71 samples/sec Loss 10.9047 LearningRate 0.0871 Epoch: 2 Global Step: 15120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:04,093-Speed 10997.10 samples/sec Loss 11.0144 LearningRate 0.0871 Epoch: 2 Global Step: 15130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:05,035-Speed 10885.26 samples/sec Loss 10.8265 LearningRate 0.0871 Epoch: 2 Global Step: 15140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-10 23:47:05,939-Speed 11327.47 samples/sec Loss 10.9841 LearningRate 0.0871 Epoch: 2 Global Step: 15150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:06,907-Speed 10591.52 samples/sec Loss 10.9488 LearningRate 0.0871 Epoch: 2 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:07,820-Speed 11217.79 samples/sec Loss 10.8968 LearningRate 0.0871 Epoch: 2 Global Step: 15170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:08,748-Speed 11049.64 samples/sec Loss 10.8317 LearningRate 0.0871 Epoch: 2 Global Step: 15180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:09,643-Speed 11448.20 samples/sec Loss 11.0942 LearningRate 0.0871 Epoch: 2 Global Step: 15190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:10,566-Speed 11101.97 samples/sec Loss 11.0071 LearningRate 0.0871 Epoch: 2 Global Step: 15200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:11,471-Speed 11329.81 samples/sec Loss 10.9296 LearningRate 0.0871 Epoch: 2 Global Step: 15210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:12,367-Speed 11424.91 samples/sec Loss 10.8426 LearningRate 0.0871 Epoch: 2 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:13,247-Speed 11646.88 samples/sec Loss 10.7548 LearningRate 0.0871 Epoch: 2 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:14,186-Speed 10909.59 samples/sec Loss 10.8482 LearningRate 0.0870 Epoch: 2 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:15,080-Speed 11471.77 samples/sec Loss 10.9542 LearningRate 0.0870 Epoch: 2 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:15,993-Speed 11215.62 samples/sec Loss 10.8420 LearningRate 0.0870 Epoch: 2 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:16,893-Speed 11385.00 samples/sec Loss 10.7001 LearningRate 0.0870 Epoch: 2 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:17,850-Speed 10708.65 samples/sec Loss 10.9275 LearningRate 0.0870 Epoch: 2 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:18,791-Speed 10892.43 samples/sec Loss 10.9568 LearningRate 0.0870 Epoch: 2 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:19,703-Speed 11230.97 samples/sec Loss 10.7895 LearningRate 0.0870 Epoch: 2 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:20,602-Speed 11400.73 samples/sec Loss 10.8281 LearningRate 0.0870 Epoch: 2 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:21,480-Speed 11667.24 samples/sec Loss 10.9725 LearningRate 0.0870 Epoch: 2 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:22,407-Speed 11057.14 samples/sec Loss 10.8603 LearningRate 0.0870 Epoch: 2 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:23,329-Speed 11109.20 samples/sec Loss 10.9521 LearningRate 0.0870 Epoch: 2 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:24,266-Speed 10941.45 samples/sec Loss 10.8260 LearningRate 0.0870 Epoch: 2 Global Step: 15350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:25,221-Speed 10723.98 samples/sec Loss 10.7729 LearningRate 0.0869 Epoch: 2 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:26,147-Speed 11069.97 samples/sec Loss 10.9965 LearningRate 0.0869 Epoch: 2 Global Step: 15370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:27,043-Speed 11431.08 samples/sec Loss 10.8576 LearningRate 0.0869 Epoch: 2 Global Step: 15380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:27,960-Speed 11174.13 samples/sec Loss 10.7881 LearningRate 0.0869 Epoch: 2 Global Step: 15390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:28,895-Speed 10960.24 samples/sec Loss 10.9067 LearningRate 0.0869 Epoch: 2 Global Step: 15400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:29,821-Speed 11079.24 samples/sec Loss 10.9480 LearningRate 0.0869 Epoch: 2 Global Step: 15410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:30,711-Speed 11507.54 samples/sec Loss 10.8809 LearningRate 0.0869 Epoch: 2 Global Step: 15420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:31,654-Speed 10875.53 samples/sec Loss 10.7025 LearningRate 0.0869 Epoch: 2 Global Step: 15430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:32,584-Speed 11016.91 samples/sec Loss 10.8417 LearningRate 0.0869 Epoch: 2 Global Step: 15440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:33,476-Speed 11483.23 samples/sec Loss 10.7644 LearningRate 0.0869 Epoch: 2 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:34,392-Speed 11195.19 samples/sec Loss 10.9009 LearningRate 0.0869 Epoch: 2 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:35,280-Speed 11541.12 samples/sec Loss 11.0022 LearningRate 0.0869 Epoch: 2 Global Step: 15470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:36,229-Speed 10796.73 samples/sec Loss 10.7224 LearningRate 0.0869 Epoch: 2 Global Step: 15480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:37,141-Speed 11228.27 samples/sec Loss 10.8448 LearningRate 0.0868 Epoch: 2 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:38,063-Speed 11115.91 samples/sec Loss 11.0971 LearningRate 0.0868 Epoch: 2 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:38,974-Speed 11250.35 samples/sec Loss 10.8809 LearningRate 0.0868 Epoch: 2 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:39,876-Speed 11358.38 samples/sec Loss 10.9313 LearningRate 0.0868 Epoch: 2 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:40,782-Speed 11312.48 samples/sec Loss 10.7802 LearningRate 0.0868 Epoch: 2 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:41,692-Speed 11267.17 samples/sec Loss 10.6844 LearningRate 0.0868 Epoch: 2 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:42,630-Speed 10919.39 samples/sec Loss 10.9524 LearningRate 0.0868 Epoch: 2 Global Step: 15550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:43,558-Speed 11039.51 samples/sec Loss 10.7979 LearningRate 0.0868 Epoch: 2 Global Step: 15560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:47:44,468-Speed 11266.56 samples/sec Loss 10.8354 LearningRate 0.0868 Epoch: 2 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:45,363-Speed 11445.79 samples/sec Loss 10.8526 LearningRate 0.0868 Epoch: 2 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:46,282-Speed 11147.87 samples/sec Loss 10.8405 LearningRate 0.0868 Epoch: 2 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:47,222-Speed 10904.21 samples/sec Loss 10.8528 LearningRate 0.0868 Epoch: 2 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:48,123-Speed 11379.57 samples/sec Loss 10.8618 LearningRate 0.0867 Epoch: 2 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:49,095-Speed 10537.43 samples/sec Loss 10.7039 LearningRate 0.0867 Epoch: 2 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:50,011-Speed 11183.34 samples/sec Loss 10.7998 LearningRate 0.0867 Epoch: 2 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:50,913-Speed 11368.89 samples/sec Loss 10.6264 LearningRate 0.0867 Epoch: 2 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:51,896-Speed 10426.42 samples/sec Loss 10.5961 LearningRate 0.0867 Epoch: 2 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:52,817-Speed 11125.41 samples/sec Loss 10.7177 LearningRate 0.0867 Epoch: 2 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:47:53,750-Speed 10978.50 samples/sec Loss 10.7575 LearningRate 0.0867 Epoch: 2 Global Step: 15670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:47:54,649-Speed 11406.17 samples/sec Loss 10.8074 LearningRate 0.0867 Epoch: 2 Global Step: 15680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:47:55,535-Speed 11565.64 samples/sec Loss 10.8517 LearningRate 0.0867 Epoch: 2 Global Step: 15690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:47:56,450-Speed 11202.63 samples/sec Loss 10.7773 LearningRate 0.0867 Epoch: 2 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:47:57,337-Speed 11543.12 samples/sec Loss 10.8023 LearningRate 0.0867 Epoch: 2 Global Step: 15710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:47:58,250-Speed 11223.88 samples/sec Loss 10.7950 LearningRate 0.0867 Epoch: 2 Global Step: 15720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:47:59,123-Speed 11750.47 samples/sec Loss 10.7031 LearningRate 0.0866 Epoch: 2 Global Step: 15730 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:48:00,041-Speed 11166.00 samples/sec Loss 10.6864 LearningRate 0.0866 Epoch: 2 Global Step: 15740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:48:00,963-Speed 11111.49 samples/sec Loss 10.8846 LearningRate 0.0866 Epoch: 2 Global Step: 15750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:48:01,871-Speed 11283.81 samples/sec Loss 10.7596 LearningRate 0.0866 Epoch: 2 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:02,783-Speed 11241.54 samples/sec Loss 10.6102 LearningRate 0.0866 Epoch: 2 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:03,689-Speed 11310.92 samples/sec Loss 10.8271 LearningRate 0.0866 Epoch: 2 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:04,587-Speed 11401.73 samples/sec Loss 10.6784 LearningRate 0.0866 Epoch: 2 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:05,497-Speed 11266.18 samples/sec Loss 10.6802 LearningRate 0.0866 Epoch: 2 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:06,422-Speed 11082.94 samples/sec Loss 10.7046 LearningRate 0.0866 Epoch: 2 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:07,344-Speed 11111.08 samples/sec Loss 10.8128 LearningRate 0.0866 Epoch: 2 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:08,272-Speed 11047.60 samples/sec Loss 10.7946 LearningRate 0.0866 Epoch: 2 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:09,166-Speed 11451.92 samples/sec Loss 10.8203 LearningRate 0.0866 Epoch: 2 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:10,054-Speed 11546.77 samples/sec Loss 10.8438 LearningRate 0.0865 Epoch: 2 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:10,954-Speed 11379.12 samples/sec Loss 10.7942 LearningRate 0.0865 Epoch: 2 Global Step: 15860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:48:11,854-Speed 11386.95 samples/sec Loss 10.7329 LearningRate 0.0865 Epoch: 2 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:12,756-Speed 11366.88 samples/sec Loss 10.9249 LearningRate 0.0865 Epoch: 2 Global Step: 15880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:13,659-Speed 11345.92 samples/sec Loss 10.8078 LearningRate 0.0865 Epoch: 2 Global Step: 15890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:14,603-Speed 10859.49 samples/sec Loss 10.7657 LearningRate 0.0865 Epoch: 2 Global Step: 15900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:15,489-Speed 11562.03 samples/sec Loss 10.7641 LearningRate 0.0865 Epoch: 2 Global Step: 15910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:16,404-Speed 11197.52 samples/sec Loss 10.5778 LearningRate 0.0865 Epoch: 2 Global Step: 15920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:17,380-Speed 10494.90 samples/sec Loss 10.5966 LearningRate 0.0865 Epoch: 2 Global Step: 15930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:18,328-Speed 10815.27 samples/sec Loss 10.7430 LearningRate 0.0865 Epoch: 2 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:19,242-Speed 11211.67 samples/sec Loss 10.8170 LearningRate 0.0865 Epoch: 2 Global Step: 15950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:20,169-Speed 11050.35 samples/sec Loss 10.6643 LearningRate 0.0865 Epoch: 2 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:21,075-Speed 11310.83 samples/sec Loss 10.8163 LearningRate 0.0864 Epoch: 2 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:48:21,949-Speed 11731.79 samples/sec Loss 10.7332 LearningRate 0.0864 Epoch: 2 Global Step: 15980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:22,850-Speed 11364.88 samples/sec Loss 10.8409 LearningRate 0.0864 Epoch: 2 Global Step: 15990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:23,774-Speed 11094.64 samples/sec Loss 10.7084 LearningRate 0.0864 Epoch: 2 Global Step: 16000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:48:45,103-[lfw][16000]XNorm: 14.927894 Training: 2022-04-10 23:48:45,104-[lfw][16000]Accuracy-Flip: 0.99250+-0.00461 Training: 2022-04-10 23:48:45,104-[lfw][16000]Accuracy-Highest: 0.99300 Training: 2022-04-10 23:49:09,816-[cfp_fp][16000]XNorm: 12.653963 Training: 2022-04-10 23:49:09,817-[cfp_fp][16000]Accuracy-Flip: 0.90357+-0.01918 Training: 2022-04-10 23:49:09,817-[cfp_fp][16000]Accuracy-Highest: 0.90471 Training: 2022-04-10 23:49:31,060-[agedb_30][16000]XNorm: 14.657290 Training: 2022-04-10 23:49:31,061-[agedb_30][16000]Accuracy-Flip: 0.93567+-0.01442 Training: 2022-04-10 23:49:31,061-[agedb_30][16000]Accuracy-Highest: 0.93717 Training: 2022-04-10 23:49:31,981-Speed 150.13 samples/sec Loss 10.7832 LearningRate 0.0864 Epoch: 2 Global Step: 16010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:32,879-Speed 11415.18 samples/sec Loss 10.8822 LearningRate 0.0864 Epoch: 2 Global Step: 16020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:33,791-Speed 11238.93 samples/sec Loss 10.7412 LearningRate 0.0864 Epoch: 2 Global Step: 16030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:34,670-Speed 11648.03 samples/sec Loss 10.6973 LearningRate 0.0864 Epoch: 2 Global Step: 16040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:35,587-Speed 11179.47 samples/sec Loss 10.6544 LearningRate 0.0864 Epoch: 2 Global Step: 16050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:36,483-Speed 11436.76 samples/sec Loss 10.6699 LearningRate 0.0864 Epoch: 2 Global Step: 16060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:37,411-Speed 11044.47 samples/sec Loss 10.5171 LearningRate 0.0864 Epoch: 2 Global Step: 16070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:38,297-Speed 11565.79 samples/sec Loss 10.7283 LearningRate 0.0864 Epoch: 2 Global Step: 16080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:39,202-Speed 11318.46 samples/sec Loss 10.7235 LearningRate 0.0864 Epoch: 2 Global Step: 16090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:40,130-Speed 11038.57 samples/sec Loss 10.8974 LearningRate 0.0863 Epoch: 2 Global Step: 16100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-10 23:49:41,031-Speed 11371.19 samples/sec Loss 10.7520 LearningRate 0.0863 Epoch: 2 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:41,938-Speed 11298.14 samples/sec Loss 10.7029 LearningRate 0.0863 Epoch: 2 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:42,821-Speed 11605.83 samples/sec Loss 10.7576 LearningRate 0.0863 Epoch: 2 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:43,739-Speed 11169.94 samples/sec Loss 10.4520 LearningRate 0.0863 Epoch: 2 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:44,622-Speed 11600.50 samples/sec Loss 10.7294 LearningRate 0.0863 Epoch: 2 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:45,490-Speed 11797.09 samples/sec Loss 10.8157 LearningRate 0.0863 Epoch: 2 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:46,392-Speed 11365.02 samples/sec Loss 10.7342 LearningRate 0.0863 Epoch: 2 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:47,268-Speed 11701.32 samples/sec Loss 10.7271 LearningRate 0.0863 Epoch: 2 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:48,191-Speed 11101.28 samples/sec Loss 10.8154 LearningRate 0.0863 Epoch: 2 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:49,100-Speed 11268.30 samples/sec Loss 10.7408 LearningRate 0.0863 Epoch: 2 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:49,987-Speed 11554.55 samples/sec Loss 10.6805 LearningRate 0.0863 Epoch: 2 Global Step: 16210 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:49:50,939-Speed 10776.55 samples/sec Loss 10.6049 LearningRate 0.0862 Epoch: 2 Global Step: 16220 Fp16 Grad Scale: 262144 Required: 8 hours Training: 2022-04-10 23:49:51,841-Speed 11371.29 samples/sec Loss 10.6730 LearningRate 0.0862 Epoch: 2 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:52,756-Speed 11190.76 samples/sec Loss 10.6336 LearningRate 0.0862 Epoch: 2 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-10 23:49:53,652-Speed 11439.67 samples/sec Loss 10.7281 LearningRate 0.0862 Epoch: 2 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:49:54,558-Speed 11314.21 samples/sec Loss 10.7605 LearningRate 0.0862 Epoch: 2 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:49:55,473-Speed 11202.73 samples/sec Loss 10.5241 LearningRate 0.0862 Epoch: 2 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:49:56,402-Speed 11026.73 samples/sec Loss 10.5179 LearningRate 0.0862 Epoch: 2 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:49:57,281-Speed 11655.57 samples/sec Loss 10.7983 LearningRate 0.0862 Epoch: 2 Global Step: 16290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:49:58,209-Speed 11048.16 samples/sec Loss 10.6494 LearningRate 0.0862 Epoch: 2 Global Step: 16300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:49:59,117-Speed 11280.64 samples/sec Loss 10.5361 LearningRate 0.0862 Epoch: 2 Global Step: 16310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:00,001-Speed 11590.59 samples/sec Loss 10.6077 LearningRate 0.0862 Epoch: 2 Global Step: 16320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:00,905-Speed 11335.05 samples/sec Loss 10.6420 LearningRate 0.0862 Epoch: 2 Global Step: 16330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:01,794-Speed 11537.44 samples/sec Loss 10.7085 LearningRate 0.0861 Epoch: 2 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:02,685-Speed 11490.88 samples/sec Loss 10.7578 LearningRate 0.0861 Epoch: 2 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:03,582-Speed 11427.17 samples/sec Loss 10.7120 LearningRate 0.0861 Epoch: 2 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:04,490-Speed 11281.48 samples/sec Loss 10.7217 LearningRate 0.0861 Epoch: 2 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:05,405-Speed 11208.29 samples/sec Loss 10.7042 LearningRate 0.0861 Epoch: 2 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:06,302-Speed 11421.33 samples/sec Loss 10.6398 LearningRate 0.0861 Epoch: 2 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:07,204-Speed 11362.87 samples/sec Loss 10.6788 LearningRate 0.0861 Epoch: 2 Global Step: 16400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:08,102-Speed 11410.70 samples/sec Loss 10.4905 LearningRate 0.0861 Epoch: 2 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:09,009-Speed 11293.73 samples/sec Loss 10.7911 LearningRate 0.0861 Epoch: 2 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:09,904-Speed 11455.92 samples/sec Loss 10.4560 LearningRate 0.0861 Epoch: 2 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:10,800-Speed 11427.75 samples/sec Loss 10.6991 LearningRate 0.0861 Epoch: 2 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:11,698-Speed 11416.97 samples/sec Loss 10.4903 LearningRate 0.0861 Epoch: 2 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:12,603-Speed 11324.36 samples/sec Loss 10.8555 LearningRate 0.0860 Epoch: 2 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:13,498-Speed 11442.29 samples/sec Loss 10.6823 LearningRate 0.0860 Epoch: 2 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:14,405-Speed 11306.87 samples/sec Loss 10.4734 LearningRate 0.0860 Epoch: 2 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:15,339-Speed 10974.35 samples/sec Loss 10.8326 LearningRate 0.0860 Epoch: 2 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:16,253-Speed 11203.75 samples/sec Loss 10.6448 LearningRate 0.0860 Epoch: 2 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:17,206-Speed 10749.43 samples/sec Loss 10.7086 LearningRate 0.0860 Epoch: 2 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:18,157-Speed 10777.14 samples/sec Loss 10.5820 LearningRate 0.0860 Epoch: 2 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:19,078-Speed 11127.17 samples/sec Loss 10.6449 LearningRate 0.0860 Epoch: 2 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:19,966-Speed 11537.19 samples/sec Loss 10.7294 LearningRate 0.0860 Epoch: 2 Global Step: 16540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:50:20,852-Speed 11579.37 samples/sec Loss 10.7109 LearningRate 0.0860 Epoch: 2 Global Step: 16550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:50:21,793-Speed 10887.96 samples/sec Loss 10.5746 LearningRate 0.0860 Epoch: 2 Global Step: 16560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:50:22,702-Speed 11266.19 samples/sec Loss 10.4480 LearningRate 0.0860 Epoch: 2 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:23,626-Speed 11095.89 samples/sec Loss 10.6843 LearningRate 0.0860 Epoch: 2 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:24,553-Speed 11048.74 samples/sec Loss 10.6401 LearningRate 0.0859 Epoch: 2 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:25,460-Speed 11302.27 samples/sec Loss 10.5148 LearningRate 0.0859 Epoch: 2 Global Step: 16600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:26,384-Speed 11080.01 samples/sec Loss 10.6334 LearningRate 0.0859 Epoch: 2 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:27,339-Speed 10737.54 samples/sec Loss 10.4798 LearningRate 0.0859 Epoch: 2 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:28,247-Speed 11281.57 samples/sec Loss 10.6642 LearningRate 0.0859 Epoch: 2 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:29,151-Speed 11340.33 samples/sec Loss 10.5997 LearningRate 0.0859 Epoch: 2 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:30,059-Speed 11279.96 samples/sec Loss 10.7972 LearningRate 0.0859 Epoch: 2 Global Step: 16650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:30,968-Speed 11277.53 samples/sec Loss 10.5049 LearningRate 0.0859 Epoch: 2 Global Step: 16660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:31,891-Speed 11101.41 samples/sec Loss 10.6282 LearningRate 0.0859 Epoch: 2 Global Step: 16670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:50:32,796-Speed 11315.73 samples/sec Loss 10.6539 LearningRate 0.0859 Epoch: 2 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:33,708-Speed 11238.61 samples/sec Loss 10.7088 LearningRate 0.0859 Epoch: 2 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:34,612-Speed 11333.07 samples/sec Loss 10.6458 LearningRate 0.0859 Epoch: 2 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:35,513-Speed 11383.67 samples/sec Loss 10.6843 LearningRate 0.0858 Epoch: 2 Global Step: 16710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:36,433-Speed 11133.80 samples/sec Loss 10.6760 LearningRate 0.0858 Epoch: 2 Global Step: 16720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:37,369-Speed 10946.41 samples/sec Loss 10.7005 LearningRate 0.0858 Epoch: 2 Global Step: 16730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:38,281-Speed 11246.77 samples/sec Loss 10.6509 LearningRate 0.0858 Epoch: 2 Global Step: 16740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:39,208-Speed 11045.34 samples/sec Loss 10.7227 LearningRate 0.0858 Epoch: 2 Global Step: 16750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:40,121-Speed 11232.58 samples/sec Loss 10.4842 LearningRate 0.0858 Epoch: 2 Global Step: 16760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:41,061-Speed 10901.05 samples/sec Loss 10.4399 LearningRate 0.0858 Epoch: 2 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:41,992-Speed 11007.42 samples/sec Loss 10.5542 LearningRate 0.0858 Epoch: 2 Global Step: 16780 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:50:42,884-Speed 11488.30 samples/sec Loss 10.5799 LearningRate 0.0858 Epoch: 2 Global Step: 16790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:50:43,809-Speed 11075.94 samples/sec Loss 10.6182 LearningRate 0.0858 Epoch: 2 Global Step: 16800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:50:44,745-Speed 10938.67 samples/sec Loss 10.4988 LearningRate 0.0858 Epoch: 2 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:45,666-Speed 11125.62 samples/sec Loss 10.3687 LearningRate 0.0858 Epoch: 2 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:46,559-Speed 11477.43 samples/sec Loss 10.5661 LearningRate 0.0857 Epoch: 2 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:47,447-Speed 11538.92 samples/sec Loss 10.4490 LearningRate 0.0857 Epoch: 2 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:48,361-Speed 11208.97 samples/sec Loss 10.6408 LearningRate 0.0857 Epoch: 2 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:49,261-Speed 11384.96 samples/sec Loss 10.4856 LearningRate 0.0857 Epoch: 2 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:50,151-Speed 11512.31 samples/sec Loss 10.7570 LearningRate 0.0857 Epoch: 2 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:51,053-Speed 11373.72 samples/sec Loss 10.5326 LearningRate 0.0857 Epoch: 2 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:51,950-Speed 11435.24 samples/sec Loss 10.7772 LearningRate 0.0857 Epoch: 2 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:52,832-Speed 11613.80 samples/sec Loss 10.5327 LearningRate 0.0857 Epoch: 2 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:53,746-Speed 11216.76 samples/sec Loss 10.4505 LearningRate 0.0857 Epoch: 2 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:54,649-Speed 11347.14 samples/sec Loss 10.4620 LearningRate 0.0857 Epoch: 2 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:55,549-Speed 11380.30 samples/sec Loss 10.6243 LearningRate 0.0857 Epoch: 2 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:56,439-Speed 11510.67 samples/sec Loss 10.6299 LearningRate 0.0857 Epoch: 2 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:57,335-Speed 11445.29 samples/sec Loss 10.6149 LearningRate 0.0857 Epoch: 2 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:58,251-Speed 11180.59 samples/sec Loss 10.6734 LearningRate 0.0856 Epoch: 2 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:50:59,195-Speed 10860.60 samples/sec Loss 10.5692 LearningRate 0.0856 Epoch: 2 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:00,095-Speed 11386.71 samples/sec Loss 10.5090 LearningRate 0.0856 Epoch: 2 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:01,058-Speed 10640.26 samples/sec Loss 10.5382 LearningRate 0.0856 Epoch: 2 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:01,973-Speed 11198.45 samples/sec Loss 10.5794 LearningRate 0.0856 Epoch: 2 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:02,875-Speed 11357.67 samples/sec Loss 10.6748 LearningRate 0.0856 Epoch: 2 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:03,787-Speed 11237.99 samples/sec Loss 10.5498 LearningRate 0.0856 Epoch: 2 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:04,727-Speed 10910.14 samples/sec Loss 10.4116 LearningRate 0.0856 Epoch: 2 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:05,649-Speed 11109.25 samples/sec Loss 10.5402 LearningRate 0.0856 Epoch: 2 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:06,609-Speed 10674.08 samples/sec Loss 10.8015 LearningRate 0.0856 Epoch: 2 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:17,794-Speed 915.63 samples/sec Loss 10.4351 LearningRate 0.0856 Epoch: 3 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:19,093-Speed 7889.60 samples/sec Loss 9.8029 LearningRate 0.0856 Epoch: 3 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:20,014-Speed 11124.86 samples/sec Loss 9.7056 LearningRate 0.0855 Epoch: 3 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:20,956-Speed 10879.09 samples/sec Loss 9.9278 LearningRate 0.0855 Epoch: 3 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:21,880-Speed 11086.43 samples/sec Loss 9.8480 LearningRate 0.0855 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:22,956-Speed 9522.93 samples/sec Loss 9.6983 LearningRate 0.0855 Epoch: 3 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:23,995-Speed 9862.17 samples/sec Loss 9.7146 LearningRate 0.0855 Epoch: 3 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:24,937-Speed 10874.07 samples/sec Loss 9.6748 LearningRate 0.0855 Epoch: 3 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:25,880-Speed 10868.25 samples/sec Loss 10.0181 LearningRate 0.0855 Epoch: 3 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:26,827-Speed 10820.11 samples/sec Loss 9.9207 LearningRate 0.0855 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:27,772-Speed 10850.60 samples/sec Loss 9.9858 LearningRate 0.0855 Epoch: 3 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:28,722-Speed 10780.92 samples/sec Loss 9.9985 LearningRate 0.0855 Epoch: 3 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:29,660-Speed 10923.01 samples/sec Loss 9.7312 LearningRate 0.0855 Epoch: 3 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:30,580-Speed 11135.11 samples/sec Loss 9.8935 LearningRate 0.0855 Epoch: 3 Global Step: 17190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:51:31,526-Speed 10833.07 samples/sec Loss 9.7787 LearningRate 0.0854 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:51:32,420-Speed 11472.31 samples/sec Loss 9.9656 LearningRate 0.0854 Epoch: 3 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:33,312-Speed 11490.04 samples/sec Loss 9.9619 LearningRate 0.0854 Epoch: 3 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:34,195-Speed 11604.80 samples/sec Loss 9.9615 LearningRate 0.0854 Epoch: 3 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:35,097-Speed 11364.58 samples/sec Loss 9.9990 LearningRate 0.0854 Epoch: 3 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:35,989-Speed 11486.24 samples/sec Loss 9.9585 LearningRate 0.0854 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:36,889-Speed 11381.72 samples/sec Loss 10.0303 LearningRate 0.0854 Epoch: 3 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:37,807-Speed 11157.60 samples/sec Loss 10.1222 LearningRate 0.0854 Epoch: 3 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:38,720-Speed 11230.64 samples/sec Loss 9.9608 LearningRate 0.0854 Epoch: 3 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:39,634-Speed 11208.22 samples/sec Loss 9.9536 LearningRate 0.0854 Epoch: 3 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:40,562-Speed 11043.96 samples/sec Loss 10.0129 LearningRate 0.0854 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:41,521-Speed 10690.45 samples/sec Loss 9.9852 LearningRate 0.0854 Epoch: 3 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:42,470-Speed 10790.49 samples/sec Loss 10.0928 LearningRate 0.0853 Epoch: 3 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:43,374-Speed 11337.35 samples/sec Loss 9.9620 LearningRate 0.0853 Epoch: 3 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:44,319-Speed 10845.10 samples/sec Loss 9.9554 LearningRate 0.0853 Epoch: 3 Global Step: 17340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:45,220-Speed 11371.74 samples/sec Loss 10.1494 LearningRate 0.0853 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:46,119-Speed 11396.96 samples/sec Loss 9.8226 LearningRate 0.0853 Epoch: 3 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:47,050-Speed 11010.92 samples/sec Loss 10.1766 LearningRate 0.0853 Epoch: 3 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:47,976-Speed 11065.37 samples/sec Loss 10.1660 LearningRate 0.0853 Epoch: 3 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:48,926-Speed 10781.82 samples/sec Loss 9.9850 LearningRate 0.0853 Epoch: 3 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:49,865-Speed 10919.87 samples/sec Loss 10.0306 LearningRate 0.0853 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:50,797-Speed 10995.35 samples/sec Loss 10.0962 LearningRate 0.0853 Epoch: 3 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:51:51,753-Speed 10714.65 samples/sec Loss 10.1743 LearningRate 0.0853 Epoch: 3 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:52,694-Speed 10890.76 samples/sec Loss 10.1577 LearningRate 0.0853 Epoch: 3 Global Step: 17430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:53,682-Speed 10367.75 samples/sec Loss 10.1154 LearningRate 0.0853 Epoch: 3 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:54,621-Speed 10920.49 samples/sec Loss 10.1620 LearningRate 0.0852 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:55,533-Speed 11238.40 samples/sec Loss 10.1559 LearningRate 0.0852 Epoch: 3 Global Step: 17460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:56,488-Speed 10730.46 samples/sec Loss 10.1071 LearningRate 0.0852 Epoch: 3 Global Step: 17470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:57,397-Speed 11273.70 samples/sec Loss 10.0332 LearningRate 0.0852 Epoch: 3 Global Step: 17480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:58,490-Speed 9388.00 samples/sec Loss 10.0569 LearningRate 0.0852 Epoch: 3 Global Step: 17490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:51:59,483-Speed 10311.24 samples/sec Loss 10.0670 LearningRate 0.0852 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:00,504-Speed 10039.65 samples/sec Loss 10.0792 LearningRate 0.0852 Epoch: 3 Global Step: 17510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:01,402-Speed 11412.60 samples/sec Loss 9.9607 LearningRate 0.0852 Epoch: 3 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:02,359-Speed 10707.12 samples/sec Loss 9.9290 LearningRate 0.0852 Epoch: 3 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:03,278-Speed 11152.62 samples/sec Loss 10.0261 LearningRate 0.0852 Epoch: 3 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:04,204-Speed 11058.47 samples/sec Loss 10.1432 LearningRate 0.0852 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:05,108-Speed 11339.48 samples/sec Loss 10.1628 LearningRate 0.0852 Epoch: 3 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:06,072-Speed 10630.62 samples/sec Loss 10.1710 LearningRate 0.0851 Epoch: 3 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:06,997-Speed 11081.14 samples/sec Loss 9.9438 LearningRate 0.0851 Epoch: 3 Global Step: 17580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:07,903-Speed 11305.49 samples/sec Loss 10.2802 LearningRate 0.0851 Epoch: 3 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:08,842-Speed 10926.32 samples/sec Loss 10.0591 LearningRate 0.0851 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:09,738-Speed 11432.82 samples/sec Loss 10.1832 LearningRate 0.0851 Epoch: 3 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:10,632-Speed 11454.54 samples/sec Loss 10.0955 LearningRate 0.0851 Epoch: 3 Global Step: 17620 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:52:11,536-Speed 11336.62 samples/sec Loss 10.2180 LearningRate 0.0851 Epoch: 3 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:12,480-Speed 10863.79 samples/sec Loss 10.2082 LearningRate 0.0851 Epoch: 3 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:13,436-Speed 10710.84 samples/sec Loss 10.1033 LearningRate 0.0851 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:14,406-Speed 10573.16 samples/sec Loss 10.0779 LearningRate 0.0851 Epoch: 3 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:15,388-Speed 10435.57 samples/sec Loss 10.1148 LearningRate 0.0851 Epoch: 3 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:16,300-Speed 11228.47 samples/sec Loss 10.2639 LearningRate 0.0851 Epoch: 3 Global Step: 17680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:17,197-Speed 11426.71 samples/sec Loss 10.2896 LearningRate 0.0850 Epoch: 3 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:18,193-Speed 10289.88 samples/sec Loss 10.0828 LearningRate 0.0850 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:19,146-Speed 10752.82 samples/sec Loss 10.0542 LearningRate 0.0850 Epoch: 3 Global Step: 17710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:20,161-Speed 10096.28 samples/sec Loss 10.1857 LearningRate 0.0850 Epoch: 3 Global Step: 17720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:21,067-Speed 11323.40 samples/sec Loss 10.1569 LearningRate 0.0850 Epoch: 3 Global Step: 17730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:21,967-Speed 11382.81 samples/sec Loss 10.3046 LearningRate 0.0850 Epoch: 3 Global Step: 17740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:52:22,896-Speed 11027.57 samples/sec Loss 10.0776 LearningRate 0.0850 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:23,844-Speed 10807.53 samples/sec Loss 10.2720 LearningRate 0.0850 Epoch: 3 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:24,731-Speed 11555.32 samples/sec Loss 10.2039 LearningRate 0.0850 Epoch: 3 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:25,654-Speed 11101.62 samples/sec Loss 10.1069 LearningRate 0.0850 Epoch: 3 Global Step: 17780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:26,575-Speed 11126.39 samples/sec Loss 10.2020 LearningRate 0.0850 Epoch: 3 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:27,512-Speed 10935.44 samples/sec Loss 10.1163 LearningRate 0.0850 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:28,453-Speed 10894.66 samples/sec Loss 10.0831 LearningRate 0.0850 Epoch: 3 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:29,355-Speed 11353.24 samples/sec Loss 10.1827 LearningRate 0.0849 Epoch: 3 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:30,268-Speed 11229.14 samples/sec Loss 10.1643 LearningRate 0.0849 Epoch: 3 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:31,214-Speed 10829.16 samples/sec Loss 10.3401 LearningRate 0.0849 Epoch: 3 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:32,159-Speed 10838.99 samples/sec Loss 10.1496 LearningRate 0.0849 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:52:33,072-Speed 11233.37 samples/sec Loss 10.2749 LearningRate 0.0849 Epoch: 3 Global Step: 17860 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:52:33,986-Speed 11205.04 samples/sec Loss 10.2027 LearningRate 0.0849 Epoch: 3 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:34,910-Speed 11088.65 samples/sec Loss 10.1635 LearningRate 0.0849 Epoch: 3 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:35,825-Speed 11206.79 samples/sec Loss 10.2607 LearningRate 0.0849 Epoch: 3 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:36,765-Speed 10898.63 samples/sec Loss 10.2718 LearningRate 0.0849 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:37,683-Speed 11167.23 samples/sec Loss 10.3371 LearningRate 0.0849 Epoch: 3 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:38,639-Speed 10718.28 samples/sec Loss 10.2782 LearningRate 0.0849 Epoch: 3 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:39,589-Speed 10786.12 samples/sec Loss 10.3124 LearningRate 0.0849 Epoch: 3 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:40,523-Speed 10975.09 samples/sec Loss 10.3677 LearningRate 0.0848 Epoch: 3 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:41,479-Speed 10719.14 samples/sec Loss 10.2088 LearningRate 0.0848 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:42,399-Speed 11141.30 samples/sec Loss 10.3075 LearningRate 0.0848 Epoch: 3 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:52:43,312-Speed 11224.67 samples/sec Loss 10.1484 LearningRate 0.0848 Epoch: 3 Global Step: 17970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:52:44,212-Speed 11384.48 samples/sec Loss 10.3593 LearningRate 0.0848 Epoch: 3 Global Step: 17980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:52:45,122-Speed 11264.18 samples/sec Loss 10.2178 LearningRate 0.0848 Epoch: 3 Global Step: 17990 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:52:46,015-Speed 11468.64 samples/sec Loss 10.2577 LearningRate 0.0848 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:53:07,287-[lfw][18000]XNorm: 14.773471 Training: 2022-04-10 23:53:07,288-[lfw][18000]Accuracy-Flip: 0.99217+-0.00533 Training: 2022-04-10 23:53:07,288-[lfw][18000]Accuracy-Highest: 0.99300 Training: 2022-04-10 23:53:31,954-[cfp_fp][18000]XNorm: 12.340243 Training: 2022-04-10 23:53:31,955-[cfp_fp][18000]Accuracy-Flip: 0.90743+-0.01469 Training: 2022-04-10 23:53:31,955-[cfp_fp][18000]Accuracy-Highest: 0.90743 Training: 2022-04-10 23:53:53,215-[agedb_30][18000]XNorm: 14.407275 Training: 2022-04-10 23:53:53,215-[agedb_30][18000]Accuracy-Flip: 0.94400+-0.01283 Training: 2022-04-10 23:53:53,216-[agedb_30][18000]Accuracy-Highest: 0.94400 Training: 2022-04-10 23:53:54,133-Speed 150.33 samples/sec Loss 10.3362 LearningRate 0.0848 Epoch: 3 Global Step: 18010 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:53:55,080-Speed 10815.79 samples/sec Loss 10.1641 LearningRate 0.0848 Epoch: 3 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:53:56,021-Speed 10893.44 samples/sec Loss 10.2816 LearningRate 0.0848 Epoch: 3 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:53:56,919-Speed 11406.32 samples/sec Loss 10.2865 LearningRate 0.0848 Epoch: 3 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:53:57,823-Speed 11340.69 samples/sec Loss 10.1123 LearningRate 0.0848 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:53:58,735-Speed 11234.66 samples/sec Loss 10.2527 LearningRate 0.0847 Epoch: 3 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:53:59,622-Speed 11543.92 samples/sec Loss 10.2676 LearningRate 0.0847 Epoch: 3 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:00,563-Speed 10896.78 samples/sec Loss 10.3190 LearningRate 0.0847 Epoch: 3 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:01,513-Speed 10784.80 samples/sec Loss 10.4228 LearningRate 0.0847 Epoch: 3 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:02,455-Speed 10881.68 samples/sec Loss 10.2742 LearningRate 0.0847 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:03,356-Speed 11363.41 samples/sec Loss 10.1528 LearningRate 0.0847 Epoch: 3 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:04,249-Speed 11480.69 samples/sec Loss 10.3009 LearningRate 0.0847 Epoch: 3 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:05,144-Speed 11450.59 samples/sec Loss 10.2603 LearningRate 0.0847 Epoch: 3 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:06,062-Speed 11163.15 samples/sec Loss 9.9543 LearningRate 0.0847 Epoch: 3 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:07,009-Speed 10823.17 samples/sec Loss 10.1921 LearningRate 0.0847 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:07,923-Speed 11205.70 samples/sec Loss 10.1857 LearningRate 0.0847 Epoch: 3 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:08,838-Speed 11199.03 samples/sec Loss 10.2052 LearningRate 0.0847 Epoch: 3 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:09,798-Speed 10680.76 samples/sec Loss 10.2527 LearningRate 0.0847 Epoch: 3 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:10,705-Speed 11300.26 samples/sec Loss 10.2156 LearningRate 0.0846 Epoch: 3 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:11,626-Speed 11118.64 samples/sec Loss 10.2275 LearningRate 0.0846 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:12,545-Speed 11156.06 samples/sec Loss 10.2184 LearningRate 0.0846 Epoch: 3 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:13,476-Speed 11007.79 samples/sec Loss 10.2210 LearningRate 0.0846 Epoch: 3 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:14,395-Speed 11142.83 samples/sec Loss 10.3582 LearningRate 0.0846 Epoch: 3 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:15,326-Speed 11019.45 samples/sec Loss 10.0746 LearningRate 0.0846 Epoch: 3 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:16,245-Speed 11151.29 samples/sec Loss 10.3286 LearningRate 0.0846 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:17,166-Speed 11121.07 samples/sec Loss 10.1817 LearningRate 0.0846 Epoch: 3 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:18,092-Speed 11061.46 samples/sec Loss 10.1583 LearningRate 0.0846 Epoch: 3 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:18,990-Speed 11420.84 samples/sec Loss 10.2077 LearningRate 0.0846 Epoch: 3 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:19,909-Speed 11145.22 samples/sec Loss 10.2976 LearningRate 0.0846 Epoch: 3 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:20,837-Speed 11039.86 samples/sec Loss 10.3691 LearningRate 0.0846 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:21,739-Speed 11372.58 samples/sec Loss 10.2040 LearningRate 0.0845 Epoch: 3 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:22,643-Speed 11324.34 samples/sec Loss 10.0896 LearningRate 0.0845 Epoch: 3 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:23,536-Speed 11485.43 samples/sec Loss 10.3103 LearningRate 0.0845 Epoch: 3 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:24,457-Speed 11121.93 samples/sec Loss 10.1758 LearningRate 0.0845 Epoch: 3 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:25,371-Speed 11202.54 samples/sec Loss 10.1338 LearningRate 0.0845 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:26,278-Speed 11298.18 samples/sec Loss 10.3011 LearningRate 0.0845 Epoch: 3 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:27,180-Speed 11371.70 samples/sec Loss 10.5168 LearningRate 0.0845 Epoch: 3 Global Step: 18370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:28,111-Speed 11007.22 samples/sec Loss 10.1559 LearningRate 0.0845 Epoch: 3 Global Step: 18380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:29,047-Speed 10949.56 samples/sec Loss 10.3072 LearningRate 0.0845 Epoch: 3 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:29,998-Speed 10772.85 samples/sec Loss 10.0887 LearningRate 0.0845 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:30,917-Speed 11152.00 samples/sec Loss 10.2978 LearningRate 0.0845 Epoch: 3 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:31,819-Speed 11355.26 samples/sec Loss 10.2238 LearningRate 0.0845 Epoch: 3 Global Step: 18420 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:32,731-Speed 11237.27 samples/sec Loss 10.3902 LearningRate 0.0845 Epoch: 3 Global Step: 18430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:33,634-Speed 11340.84 samples/sec Loss 10.1175 LearningRate 0.0844 Epoch: 3 Global Step: 18440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:34,559-Speed 11080.93 samples/sec Loss 10.2376 LearningRate 0.0844 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:35,474-Speed 11204.09 samples/sec Loss 10.2836 LearningRate 0.0844 Epoch: 3 Global Step: 18460 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:36,416-Speed 10875.55 samples/sec Loss 10.2609 LearningRate 0.0844 Epoch: 3 Global Step: 18470 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:37,352-Speed 10950.03 samples/sec Loss 10.4271 LearningRate 0.0844 Epoch: 3 Global Step: 18480 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:38,288-Speed 10943.37 samples/sec Loss 10.2178 LearningRate 0.0844 Epoch: 3 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:39,217-Speed 11036.77 samples/sec Loss 10.2857 LearningRate 0.0844 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:40,179-Speed 10648.24 samples/sec Loss 10.3296 LearningRate 0.0844 Epoch: 3 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:41,100-Speed 11137.32 samples/sec Loss 10.4165 LearningRate 0.0844 Epoch: 3 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:42,022-Speed 11104.32 samples/sec Loss 10.3610 LearningRate 0.0844 Epoch: 3 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:42,958-Speed 10948.64 samples/sec Loss 10.3043 LearningRate 0.0844 Epoch: 3 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:43,906-Speed 10823.62 samples/sec Loss 10.3413 LearningRate 0.0844 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:44,830-Speed 11090.57 samples/sec Loss 10.3264 LearningRate 0.0843 Epoch: 3 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:45,728-Speed 11400.90 samples/sec Loss 10.2720 LearningRate 0.0843 Epoch: 3 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:46,633-Speed 11324.24 samples/sec Loss 10.2598 LearningRate 0.0843 Epoch: 3 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:47,519-Speed 11591.60 samples/sec Loss 10.1654 LearningRate 0.0843 Epoch: 3 Global Step: 18590 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:54:48,415-Speed 11430.11 samples/sec Loss 10.0500 LearningRate 0.0843 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:49,318-Speed 11350.61 samples/sec Loss 10.2691 LearningRate 0.0843 Epoch: 3 Global Step: 18610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:50,252-Speed 10975.96 samples/sec Loss 10.2108 LearningRate 0.0843 Epoch: 3 Global Step: 18620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:51,164-Speed 11234.38 samples/sec Loss 10.3967 LearningRate 0.0843 Epoch: 3 Global Step: 18630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:52,078-Speed 11210.39 samples/sec Loss 10.3190 LearningRate 0.0843 Epoch: 3 Global Step: 18640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:53,024-Speed 10828.05 samples/sec Loss 10.3504 LearningRate 0.0843 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:53,953-Speed 11032.90 samples/sec Loss 10.2969 LearningRate 0.0843 Epoch: 3 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:54,891-Speed 10921.27 samples/sec Loss 10.1953 LearningRate 0.0843 Epoch: 3 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:55,789-Speed 11407.30 samples/sec Loss 10.0347 LearningRate 0.0842 Epoch: 3 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:56,706-Speed 11183.83 samples/sec Loss 10.3493 LearningRate 0.0842 Epoch: 3 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:57,624-Speed 11155.46 samples/sec Loss 10.2669 LearningRate 0.0842 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:54:58,572-Speed 10808.90 samples/sec Loss 10.1204 LearningRate 0.0842 Epoch: 3 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:54:59,477-Speed 11319.46 samples/sec Loss 10.1548 LearningRate 0.0842 Epoch: 3 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:00,381-Speed 11346.19 samples/sec Loss 10.2420 LearningRate 0.0842 Epoch: 3 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:01,308-Speed 11050.52 samples/sec Loss 10.1754 LearningRate 0.0842 Epoch: 3 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:02,248-Speed 10903.41 samples/sec Loss 10.3442 LearningRate 0.0842 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:03,143-Speed 11445.57 samples/sec Loss 10.2329 LearningRate 0.0842 Epoch: 3 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:04,071-Speed 11042.93 samples/sec Loss 10.2002 LearningRate 0.0842 Epoch: 3 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:04,996-Speed 11075.10 samples/sec Loss 10.2710 LearningRate 0.0842 Epoch: 3 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:05,959-Speed 10639.33 samples/sec Loss 10.0978 LearningRate 0.0842 Epoch: 3 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:06,945-Speed 10401.09 samples/sec Loss 10.2214 LearningRate 0.0842 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:07,872-Speed 11056.48 samples/sec Loss 10.2799 LearningRate 0.0841 Epoch: 3 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:08,803-Speed 11011.88 samples/sec Loss 10.2119 LearningRate 0.0841 Epoch: 3 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:09,731-Speed 11037.41 samples/sec Loss 10.2321 LearningRate 0.0841 Epoch: 3 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:10,649-Speed 11163.04 samples/sec Loss 10.1476 LearningRate 0.0841 Epoch: 3 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:11,559-Speed 11258.90 samples/sec Loss 10.1502 LearningRate 0.0841 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:12,486-Speed 11063.95 samples/sec Loss 10.2424 LearningRate 0.0841 Epoch: 3 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:13,399-Speed 11215.29 samples/sec Loss 10.3097 LearningRate 0.0841 Epoch: 3 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:14,313-Speed 11221.16 samples/sec Loss 10.2568 LearningRate 0.0841 Epoch: 3 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:15,202-Speed 11528.79 samples/sec Loss 10.2889 LearningRate 0.0841 Epoch: 3 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:16,134-Speed 10985.03 samples/sec Loss 10.1920 LearningRate 0.0841 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:17,026-Speed 11496.15 samples/sec Loss 10.2954 LearningRate 0.0841 Epoch: 3 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:17,945-Speed 11152.62 samples/sec Loss 10.1576 LearningRate 0.0841 Epoch: 3 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:18,934-Speed 10354.27 samples/sec Loss 10.2581 LearningRate 0.0840 Epoch: 3 Global Step: 18930 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:55:19,862-Speed 11044.27 samples/sec Loss 10.1467 LearningRate 0.0840 Epoch: 3 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:20,779-Speed 11173.44 samples/sec Loss 10.1712 LearningRate 0.0840 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:21,728-Speed 10797.41 samples/sec Loss 10.1617 LearningRate 0.0840 Epoch: 3 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:22,628-Speed 11387.46 samples/sec Loss 10.1732 LearningRate 0.0840 Epoch: 3 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:23,565-Speed 10927.48 samples/sec Loss 10.1436 LearningRate 0.0840 Epoch: 3 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:24,478-Speed 11222.55 samples/sec Loss 10.2370 LearningRate 0.0840 Epoch: 3 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:25,385-Speed 11306.05 samples/sec Loss 10.2107 LearningRate 0.0840 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:26,337-Speed 10761.19 samples/sec Loss 10.3015 LearningRate 0.0840 Epoch: 3 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:27,276-Speed 10913.85 samples/sec Loss 10.2460 LearningRate 0.0840 Epoch: 3 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:28,236-Speed 10675.59 samples/sec Loss 10.1623 LearningRate 0.0840 Epoch: 3 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:29,179-Speed 10870.91 samples/sec Loss 10.2341 LearningRate 0.0840 Epoch: 3 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:30,098-Speed 11144.60 samples/sec Loss 10.1057 LearningRate 0.0839 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:31,056-Speed 10704.95 samples/sec Loss 10.3505 LearningRate 0.0839 Epoch: 3 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:31,993-Speed 10937.99 samples/sec Loss 10.2657 LearningRate 0.0839 Epoch: 3 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:32,906-Speed 11224.34 samples/sec Loss 10.2162 LearningRate 0.0839 Epoch: 3 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:33,817-Speed 11245.15 samples/sec Loss 10.1390 LearningRate 0.0839 Epoch: 3 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:34,708-Speed 11498.08 samples/sec Loss 10.1592 LearningRate 0.0839 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:35,644-Speed 10949.43 samples/sec Loss 10.1181 LearningRate 0.0839 Epoch: 3 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:36,552-Speed 11281.56 samples/sec Loss 10.2668 LearningRate 0.0839 Epoch: 3 Global Step: 19120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:37,474-Speed 11119.39 samples/sec Loss 10.2206 LearningRate 0.0839 Epoch: 3 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:38,395-Speed 11121.59 samples/sec Loss 10.2738 LearningRate 0.0839 Epoch: 3 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:39,348-Speed 10758.57 samples/sec Loss 10.2124 LearningRate 0.0839 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:40,276-Speed 11042.51 samples/sec Loss 10.3561 LearningRate 0.0839 Epoch: 3 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:41,174-Speed 11409.22 samples/sec Loss 10.1456 LearningRate 0.0839 Epoch: 3 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:42,096-Speed 11114.09 samples/sec Loss 10.3198 LearningRate 0.0838 Epoch: 3 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:43,028-Speed 10989.82 samples/sec Loss 10.0614 LearningRate 0.0838 Epoch: 3 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:43,960-Speed 10998.90 samples/sec Loss 10.0836 LearningRate 0.0838 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:44,887-Speed 11051.75 samples/sec Loss 10.2290 LearningRate 0.0838 Epoch: 3 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:45,807-Speed 11140.94 samples/sec Loss 10.3380 LearningRate 0.0838 Epoch: 3 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:46,729-Speed 11109.22 samples/sec Loss 10.2720 LearningRate 0.0838 Epoch: 3 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:47,648-Speed 11151.28 samples/sec Loss 10.2629 LearningRate 0.0838 Epoch: 3 Global Step: 19240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:48,575-Speed 11058.38 samples/sec Loss 10.2504 LearningRate 0.0838 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:49,534-Speed 10683.02 samples/sec Loss 10.2423 LearningRate 0.0838 Epoch: 3 Global Step: 19260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:55:50,492-Speed 10696.64 samples/sec Loss 10.0600 LearningRate 0.0838 Epoch: 3 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:51,455-Speed 10635.06 samples/sec Loss 10.0227 LearningRate 0.0838 Epoch: 3 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:52,421-Speed 10611.32 samples/sec Loss 10.3050 LearningRate 0.0838 Epoch: 3 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:53,357-Speed 10945.23 samples/sec Loss 10.2531 LearningRate 0.0837 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:54,283-Speed 11068.54 samples/sec Loss 10.1349 LearningRate 0.0837 Epoch: 3 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:55,198-Speed 11202.30 samples/sec Loss 10.1314 LearningRate 0.0837 Epoch: 3 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:56,147-Speed 10804.73 samples/sec Loss 10.1834 LearningRate 0.0837 Epoch: 3 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:57,046-Speed 11397.51 samples/sec Loss 10.1686 LearningRate 0.0837 Epoch: 3 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:57,976-Speed 11012.28 samples/sec Loss 10.3128 LearningRate 0.0837 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:58,914-Speed 10919.55 samples/sec Loss 10.2764 LearningRate 0.0837 Epoch: 3 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:55:59,853-Speed 10920.35 samples/sec Loss 10.1779 LearningRate 0.0837 Epoch: 3 Global Step: 19370 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:56:00,745-Speed 11485.09 samples/sec Loss 10.1878 LearningRate 0.0837 Epoch: 3 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:01,692-Speed 10826.06 samples/sec Loss 10.3374 LearningRate 0.0837 Epoch: 3 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:02,623-Speed 10995.23 samples/sec Loss 10.1695 LearningRate 0.0837 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:03,517-Speed 11472.62 samples/sec Loss 10.1337 LearningRate 0.0837 Epoch: 3 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:04,429-Speed 11227.21 samples/sec Loss 10.3085 LearningRate 0.0837 Epoch: 3 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:05,319-Speed 11520.61 samples/sec Loss 10.4116 LearningRate 0.0836 Epoch: 3 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:06,247-Speed 11041.53 samples/sec Loss 10.1311 LearningRate 0.0836 Epoch: 3 Global Step: 19440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:07,190-Speed 10863.65 samples/sec Loss 10.2443 LearningRate 0.0836 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:08,152-Speed 10650.65 samples/sec Loss 10.1414 LearningRate 0.0836 Epoch: 3 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:09,057-Speed 11323.62 samples/sec Loss 10.1981 LearningRate 0.0836 Epoch: 3 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:09,967-Speed 11266.49 samples/sec Loss 10.1293 LearningRate 0.0836 Epoch: 3 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:10,919-Speed 10762.40 samples/sec Loss 9.9643 LearningRate 0.0836 Epoch: 3 Global Step: 19490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:11,823-Speed 11336.08 samples/sec Loss 10.1580 LearningRate 0.0836 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:12,742-Speed 11147.39 samples/sec Loss 10.3126 LearningRate 0.0836 Epoch: 3 Global Step: 19510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:13,649-Speed 11299.77 samples/sec Loss 10.0685 LearningRate 0.0836 Epoch: 3 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:14,557-Speed 11290.66 samples/sec Loss 10.1801 LearningRate 0.0836 Epoch: 3 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:15,512-Speed 10736.47 samples/sec Loss 10.0697 LearningRate 0.0836 Epoch: 3 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:16,436-Speed 11088.41 samples/sec Loss 10.0340 LearningRate 0.0835 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:17,384-Speed 10806.22 samples/sec Loss 10.1467 LearningRate 0.0835 Epoch: 3 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:18,339-Speed 10728.13 samples/sec Loss 10.1744 LearningRate 0.0835 Epoch: 3 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:19,262-Speed 11103.50 samples/sec Loss 10.2701 LearningRate 0.0835 Epoch: 3 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:20,190-Speed 11042.79 samples/sec Loss 10.1328 LearningRate 0.0835 Epoch: 3 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:21,136-Speed 10831.84 samples/sec Loss 10.1529 LearningRate 0.0835 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:22,084-Speed 10812.38 samples/sec Loss 10.3945 LearningRate 0.0835 Epoch: 3 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:23,012-Speed 11034.27 samples/sec Loss 10.2284 LearningRate 0.0835 Epoch: 3 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:23,954-Speed 10877.28 samples/sec Loss 10.2088 LearningRate 0.0835 Epoch: 3 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:24,869-Speed 11201.52 samples/sec Loss 10.2971 LearningRate 0.0835 Epoch: 3 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:25,808-Speed 10919.88 samples/sec Loss 10.2410 LearningRate 0.0835 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:26,717-Speed 11269.46 samples/sec Loss 10.2540 LearningRate 0.0835 Epoch: 3 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:27,642-Speed 11076.45 samples/sec Loss 10.0255 LearningRate 0.0835 Epoch: 3 Global Step: 19670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:28,567-Speed 11077.77 samples/sec Loss 10.2609 LearningRate 0.0834 Epoch: 3 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:29,489-Speed 11118.20 samples/sec Loss 10.0982 LearningRate 0.0834 Epoch: 3 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:30,450-Speed 10653.74 samples/sec Loss 10.0544 LearningRate 0.0834 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:31,376-Speed 11076.78 samples/sec Loss 10.1946 LearningRate 0.0834 Epoch: 3 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:32,311-Speed 10961.61 samples/sec Loss 10.0097 LearningRate 0.0834 Epoch: 3 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:33,201-Speed 11508.17 samples/sec Loss 10.2294 LearningRate 0.0834 Epoch: 3 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:34,108-Speed 11294.03 samples/sec Loss 10.1714 LearningRate 0.0834 Epoch: 3 Global Step: 19740 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:56:35,013-Speed 11326.08 samples/sec Loss 10.1803 LearningRate 0.0834 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:56:35,942-Speed 11029.94 samples/sec Loss 10.0803 LearningRate 0.0834 Epoch: 3 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:36,928-Speed 10404.43 samples/sec Loss 10.0583 LearningRate 0.0834 Epoch: 3 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:37,867-Speed 10924.59 samples/sec Loss 10.3385 LearningRate 0.0834 Epoch: 3 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:38,815-Speed 10802.29 samples/sec Loss 10.2482 LearningRate 0.0834 Epoch: 3 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:39,754-Speed 10916.73 samples/sec Loss 10.2097 LearningRate 0.0833 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:40,691-Speed 10936.57 samples/sec Loss 10.2587 LearningRate 0.0833 Epoch: 3 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:41,611-Speed 11139.76 samples/sec Loss 10.2097 LearningRate 0.0833 Epoch: 3 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:42,562-Speed 10771.48 samples/sec Loss 9.9844 LearningRate 0.0833 Epoch: 3 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:43,488-Speed 11065.45 samples/sec Loss 10.2918 LearningRate 0.0833 Epoch: 3 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:44,382-Speed 11464.09 samples/sec Loss 10.0544 LearningRate 0.0833 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:45,271-Speed 11525.22 samples/sec Loss 10.1572 LearningRate 0.0833 Epoch: 3 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:46,175-Speed 11341.81 samples/sec Loss 10.1595 LearningRate 0.0833 Epoch: 3 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:47,079-Speed 11330.38 samples/sec Loss 10.1563 LearningRate 0.0833 Epoch: 3 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:47,998-Speed 11155.23 samples/sec Loss 10.2291 LearningRate 0.0833 Epoch: 3 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:48,919-Speed 11126.18 samples/sec Loss 10.2352 LearningRate 0.0833 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:49,824-Speed 11323.15 samples/sec Loss 9.9752 LearningRate 0.0833 Epoch: 3 Global Step: 19910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:50,763-Speed 10909.01 samples/sec Loss 10.2277 LearningRate 0.0833 Epoch: 3 Global Step: 19920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:51,677-Speed 11203.11 samples/sec Loss 10.0803 LearningRate 0.0832 Epoch: 3 Global Step: 19930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:52,620-Speed 10866.31 samples/sec Loss 10.0177 LearningRate 0.0832 Epoch: 3 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:53,545-Speed 11081.17 samples/sec Loss 10.0763 LearningRate 0.0832 Epoch: 3 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:54,497-Speed 10762.08 samples/sec Loss 10.0882 LearningRate 0.0832 Epoch: 3 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:55,408-Speed 11244.39 samples/sec Loss 10.1971 LearningRate 0.0832 Epoch: 3 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:56,353-Speed 10843.05 samples/sec Loss 10.1232 LearningRate 0.0832 Epoch: 3 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:56:57,271-Speed 11170.70 samples/sec Loss 10.1544 LearningRate 0.0832 Epoch: 3 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:56:58,189-Speed 11160.91 samples/sec Loss 10.1779 LearningRate 0.0832 Epoch: 3 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:57:19,543-[lfw][20000]XNorm: 14.755617 Training: 2022-04-10 23:57:19,543-[lfw][20000]Accuracy-Flip: 0.99183+-0.00404 Training: 2022-04-10 23:57:19,544-[lfw][20000]Accuracy-Highest: 0.99300 Training: 2022-04-10 23:57:44,205-[cfp_fp][20000]XNorm: 12.426521 Training: 2022-04-10 23:57:44,205-[cfp_fp][20000]Accuracy-Flip: 0.91014+-0.01710 Training: 2022-04-10 23:57:44,206-[cfp_fp][20000]Accuracy-Highest: 0.91014 Training: 2022-04-10 23:58:05,474-[agedb_30][20000]XNorm: 14.517946 Training: 2022-04-10 23:58:05,474-[agedb_30][20000]Accuracy-Flip: 0.94183+-0.01550 Training: 2022-04-10 23:58:05,475-[agedb_30][20000]Accuracy-Highest: 0.94400 Training: 2022-04-10 23:58:06,379-Speed 150.17 samples/sec Loss 10.2114 LearningRate 0.0832 Epoch: 3 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:07,333-Speed 10743.62 samples/sec Loss 10.0539 LearningRate 0.0832 Epoch: 3 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:08,305-Speed 10547.60 samples/sec Loss 10.0293 LearningRate 0.0832 Epoch: 3 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:09,195-Speed 11510.09 samples/sec Loss 10.0743 LearningRate 0.0832 Epoch: 3 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:10,088-Speed 11468.46 samples/sec Loss 10.2099 LearningRate 0.0831 Epoch: 3 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:10,987-Speed 11396.82 samples/sec Loss 10.1779 LearningRate 0.0831 Epoch: 3 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:11,904-Speed 11180.64 samples/sec Loss 10.1791 LearningRate 0.0831 Epoch: 3 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:12,830-Speed 11063.03 samples/sec Loss 9.9672 LearningRate 0.0831 Epoch: 3 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:13,769-Speed 10911.71 samples/sec Loss 10.1649 LearningRate 0.0831 Epoch: 3 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:14,698-Speed 11032.14 samples/sec Loss 10.1899 LearningRate 0.0831 Epoch: 3 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:15,670-Speed 10542.63 samples/sec Loss 10.0425 LearningRate 0.0831 Epoch: 3 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:16,629-Speed 10686.13 samples/sec Loss 10.1542 LearningRate 0.0831 Epoch: 3 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:17,576-Speed 10819.98 samples/sec Loss 10.1706 LearningRate 0.0831 Epoch: 3 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:18,531-Speed 10730.58 samples/sec Loss 10.0529 LearningRate 0.0831 Epoch: 3 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:19,443-Speed 11234.24 samples/sec Loss 9.9973 LearningRate 0.0831 Epoch: 3 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:20,396-Speed 10753.30 samples/sec Loss 10.1615 LearningRate 0.0831 Epoch: 3 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:21,302-Speed 11314.23 samples/sec Loss 10.2310 LearningRate 0.0830 Epoch: 3 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:22,246-Speed 10850.74 samples/sec Loss 10.1775 LearningRate 0.0830 Epoch: 3 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:23,174-Speed 11044.22 samples/sec Loss 10.1560 LearningRate 0.0830 Epoch: 3 Global Step: 20190 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:58:24,045-Speed 11762.31 samples/sec Loss 10.3100 LearningRate 0.0830 Epoch: 3 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:24,967-Speed 11107.17 samples/sec Loss 10.0568 LearningRate 0.0830 Epoch: 3 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:25,860-Speed 11475.60 samples/sec Loss 10.1290 LearningRate 0.0830 Epoch: 3 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:26,791-Speed 11009.15 samples/sec Loss 10.0469 LearningRate 0.0830 Epoch: 3 Global Step: 20230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:27,721-Speed 11023.02 samples/sec Loss 10.1848 LearningRate 0.0830 Epoch: 3 Global Step: 20240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:28,618-Speed 11413.64 samples/sec Loss 10.0935 LearningRate 0.0830 Epoch: 3 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:29,551-Speed 10985.23 samples/sec Loss 9.9844 LearningRate 0.0830 Epoch: 3 Global Step: 20260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:30,501-Speed 10784.50 samples/sec Loss 9.9081 LearningRate 0.0830 Epoch: 3 Global Step: 20270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:31,428-Speed 11055.33 samples/sec Loss 10.1069 LearningRate 0.0830 Epoch: 3 Global Step: 20280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:32,371-Speed 10874.44 samples/sec Loss 9.9511 LearningRate 0.0830 Epoch: 3 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:33,307-Speed 10944.61 samples/sec Loss 10.0357 LearningRate 0.0829 Epoch: 3 Global Step: 20300 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:58:34,264-Speed 10710.14 samples/sec Loss 9.9611 LearningRate 0.0829 Epoch: 3 Global Step: 20310 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:58:35,133-Speed 11780.66 samples/sec Loss 10.0293 LearningRate 0.0829 Epoch: 3 Global Step: 20320 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:58:36,025-Speed 11494.79 samples/sec Loss 9.9405 LearningRate 0.0829 Epoch: 3 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:36,934-Speed 11275.51 samples/sec Loss 10.0827 LearningRate 0.0829 Epoch: 3 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:37,881-Speed 10826.32 samples/sec Loss 10.1465 LearningRate 0.0829 Epoch: 3 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:38,819-Speed 10917.85 samples/sec Loss 10.0506 LearningRate 0.0829 Epoch: 3 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:39,753-Speed 10972.61 samples/sec Loss 9.8272 LearningRate 0.0829 Epoch: 3 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:40,707-Speed 10739.05 samples/sec Loss 9.9566 LearningRate 0.0829 Epoch: 3 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:41,640-Speed 10981.42 samples/sec Loss 9.9781 LearningRate 0.0829 Epoch: 3 Global Step: 20390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:42,569-Speed 11035.88 samples/sec Loss 10.0652 LearningRate 0.0829 Epoch: 3 Global Step: 20400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:43,518-Speed 10798.35 samples/sec Loss 10.1609 LearningRate 0.0829 Epoch: 3 Global Step: 20410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:44,476-Speed 10688.04 samples/sec Loss 9.8940 LearningRate 0.0828 Epoch: 3 Global Step: 20420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:45,387-Speed 11255.42 samples/sec Loss 10.2151 LearningRate 0.0828 Epoch: 3 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:46,293-Speed 11307.54 samples/sec Loss 10.1109 LearningRate 0.0828 Epoch: 3 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:47,206-Speed 11227.97 samples/sec Loss 10.2906 LearningRate 0.0828 Epoch: 3 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:48,113-Speed 11298.08 samples/sec Loss 10.0562 LearningRate 0.0828 Epoch: 3 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:49,044-Speed 11007.25 samples/sec Loss 10.0726 LearningRate 0.0828 Epoch: 3 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:49,978-Speed 10965.51 samples/sec Loss 10.0591 LearningRate 0.0828 Epoch: 3 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:58:50,887-Speed 11271.46 samples/sec Loss 10.0881 LearningRate 0.0828 Epoch: 3 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:51,805-Speed 11171.44 samples/sec Loss 9.9124 LearningRate 0.0828 Epoch: 3 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:52,732-Speed 11053.32 samples/sec Loss 10.1440 LearningRate 0.0828 Epoch: 3 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:53,679-Speed 10814.14 samples/sec Loss 10.0985 LearningRate 0.0828 Epoch: 3 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:54,620-Speed 10894.73 samples/sec Loss 10.1186 LearningRate 0.0828 Epoch: 3 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:55,544-Speed 11085.41 samples/sec Loss 10.1439 LearningRate 0.0828 Epoch: 3 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:56,464-Speed 11138.80 samples/sec Loss 10.0523 LearningRate 0.0827 Epoch: 3 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:57,353-Speed 11526.90 samples/sec Loss 10.0677 LearningRate 0.0827 Epoch: 3 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:58,284-Speed 11016.71 samples/sec Loss 9.9935 LearningRate 0.0827 Epoch: 3 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:58:59,198-Speed 11208.08 samples/sec Loss 10.1071 LearningRate 0.0827 Epoch: 3 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:00,097-Speed 11396.48 samples/sec Loss 10.0293 LearningRate 0.0827 Epoch: 3 Global Step: 20590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:01,071-Speed 10520.08 samples/sec Loss 10.0606 LearningRate 0.0827 Epoch: 3 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:01,986-Speed 11203.40 samples/sec Loss 10.1487 LearningRate 0.0827 Epoch: 3 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:02,911-Speed 11076.19 samples/sec Loss 10.2230 LearningRate 0.0827 Epoch: 3 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:03,861-Speed 10788.28 samples/sec Loss 10.0626 LearningRate 0.0827 Epoch: 3 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:04,834-Speed 10534.69 samples/sec Loss 10.2355 LearningRate 0.0827 Epoch: 3 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:05,786-Speed 10763.53 samples/sec Loss 10.1732 LearningRate 0.0827 Epoch: 3 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:06,713-Speed 11046.60 samples/sec Loss 10.2916 LearningRate 0.0827 Epoch: 3 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:07,639-Speed 11072.25 samples/sec Loss 10.1364 LearningRate 0.0826 Epoch: 3 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:08,553-Speed 11206.58 samples/sec Loss 10.2188 LearningRate 0.0826 Epoch: 3 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:09,474-Speed 11125.30 samples/sec Loss 10.2151 LearningRate 0.0826 Epoch: 3 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:10,385-Speed 11252.58 samples/sec Loss 10.0964 LearningRate 0.0826 Epoch: 3 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:11,323-Speed 10924.27 samples/sec Loss 10.0978 LearningRate 0.0826 Epoch: 3 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:12,256-Speed 10978.37 samples/sec Loss 9.9890 LearningRate 0.0826 Epoch: 3 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:13,152-Speed 11436.84 samples/sec Loss 9.9435 LearningRate 0.0826 Epoch: 3 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:14,069-Speed 11173.46 samples/sec Loss 10.2104 LearningRate 0.0826 Epoch: 3 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:14,970-Speed 11373.03 samples/sec Loss 10.1727 LearningRate 0.0826 Epoch: 3 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:15,884-Speed 11215.33 samples/sec Loss 10.0404 LearningRate 0.0826 Epoch: 3 Global Step: 20760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:16,825-Speed 10884.18 samples/sec Loss 9.9110 LearningRate 0.0826 Epoch: 3 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:17,753-Speed 11049.48 samples/sec Loss 10.0540 LearningRate 0.0826 Epoch: 3 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:18,715-Speed 10653.66 samples/sec Loss 9.9775 LearningRate 0.0826 Epoch: 3 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:19,651-Speed 10943.98 samples/sec Loss 9.9317 LearningRate 0.0825 Epoch: 3 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:20,570-Speed 11149.35 samples/sec Loss 10.1125 LearningRate 0.0825 Epoch: 3 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:21,492-Speed 11113.46 samples/sec Loss 10.0044 LearningRate 0.0825 Epoch: 3 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:22,395-Speed 11354.73 samples/sec Loss 10.1920 LearningRate 0.0825 Epoch: 3 Global Step: 20830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-10 23:59:23,279-Speed 11587.02 samples/sec Loss 10.1838 LearningRate 0.0825 Epoch: 3 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:24,161-Speed 11622.58 samples/sec Loss 10.0549 LearningRate 0.0825 Epoch: 3 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:25,084-Speed 11104.26 samples/sec Loss 9.9584 LearningRate 0.0825 Epoch: 3 Global Step: 20860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:26,037-Speed 10758.27 samples/sec Loss 10.0756 LearningRate 0.0825 Epoch: 3 Global Step: 20870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:26,940-Speed 11345.74 samples/sec Loss 9.9772 LearningRate 0.0825 Epoch: 3 Global Step: 20880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:27,863-Speed 11099.68 samples/sec Loss 10.1701 LearningRate 0.0825 Epoch: 3 Global Step: 20890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:28,780-Speed 11171.22 samples/sec Loss 9.9100 LearningRate 0.0825 Epoch: 3 Global Step: 20900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:29,721-Speed 10900.63 samples/sec Loss 9.9519 LearningRate 0.0825 Epoch: 3 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:30,650-Speed 11029.17 samples/sec Loss 10.1674 LearningRate 0.0824 Epoch: 3 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:31,560-Speed 11266.32 samples/sec Loss 10.1755 LearningRate 0.0824 Epoch: 3 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:32,473-Speed 11220.61 samples/sec Loss 9.9648 LearningRate 0.0824 Epoch: 3 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:33,405-Speed 11001.81 samples/sec Loss 9.9965 LearningRate 0.0824 Epoch: 3 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:34,315-Speed 11251.53 samples/sec Loss 9.9775 LearningRate 0.0824 Epoch: 3 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:35,224-Speed 11279.62 samples/sec Loss 9.9966 LearningRate 0.0824 Epoch: 3 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:36,176-Speed 10760.98 samples/sec Loss 9.9456 LearningRate 0.0824 Epoch: 3 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:37,080-Speed 11338.74 samples/sec Loss 9.9612 LearningRate 0.0824 Epoch: 3 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:38,039-Speed 10688.13 samples/sec Loss 10.0942 LearningRate 0.0824 Epoch: 3 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:38,964-Speed 11069.28 samples/sec Loss 9.9686 LearningRate 0.0824 Epoch: 3 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:39,902-Speed 10930.29 samples/sec Loss 9.8974 LearningRate 0.0824 Epoch: 3 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:40,866-Speed 10622.23 samples/sec Loss 9.8991 LearningRate 0.0824 Epoch: 3 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:41,789-Speed 11102.67 samples/sec Loss 10.0783 LearningRate 0.0824 Epoch: 3 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:42,716-Speed 11057.94 samples/sec Loss 9.9522 LearningRate 0.0823 Epoch: 3 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:43,667-Speed 10771.77 samples/sec Loss 10.0294 LearningRate 0.0823 Epoch: 3 Global Step: 21060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:44,602-Speed 10966.37 samples/sec Loss 10.0000 LearningRate 0.0823 Epoch: 3 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:45,500-Speed 11406.49 samples/sec Loss 10.0268 LearningRate 0.0823 Epoch: 3 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:46,401-Speed 11383.17 samples/sec Loss 10.0025 LearningRate 0.0823 Epoch: 3 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:47,305-Speed 11325.74 samples/sec Loss 10.0769 LearningRate 0.0823 Epoch: 3 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:48,222-Speed 11185.20 samples/sec Loss 10.0471 LearningRate 0.0823 Epoch: 3 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:49,129-Speed 11289.29 samples/sec Loss 10.0155 LearningRate 0.0823 Epoch: 3 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:50,066-Speed 10939.26 samples/sec Loss 10.2123 LearningRate 0.0823 Epoch: 3 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:50,991-Speed 11076.59 samples/sec Loss 10.1133 LearningRate 0.0823 Epoch: 3 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:51,921-Speed 11014.80 samples/sec Loss 9.9466 LearningRate 0.0823 Epoch: 3 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:52,849-Speed 11039.73 samples/sec Loss 10.1193 LearningRate 0.0823 Epoch: 3 Global Step: 21160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:53,791-Speed 10883.96 samples/sec Loss 10.0023 LearningRate 0.0823 Epoch: 3 Global Step: 21170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:54,726-Speed 10959.10 samples/sec Loss 10.0088 LearningRate 0.0822 Epoch: 3 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:55,669-Speed 10868.11 samples/sec Loss 9.7772 LearningRate 0.0822 Epoch: 3 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-10 23:59:56,574-Speed 11326.74 samples/sec Loss 9.8831 LearningRate 0.0822 Epoch: 3 Global Step: 21200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:57,475-Speed 11368.40 samples/sec Loss 10.0124 LearningRate 0.0822 Epoch: 3 Global Step: 21210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:58,396-Speed 11124.72 samples/sec Loss 9.9132 LearningRate 0.0822 Epoch: 3 Global Step: 21220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-10 23:59:59,355-Speed 10682.23 samples/sec Loss 9.9366 LearningRate 0.0822 Epoch: 3 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:00,278-Speed 11097.93 samples/sec Loss 9.9937 LearningRate 0.0822 Epoch: 3 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:01,226-Speed 10818.55 samples/sec Loss 9.9866 LearningRate 0.0822 Epoch: 3 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:02,148-Speed 11107.06 samples/sec Loss 10.1674 LearningRate 0.0822 Epoch: 3 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:03,104-Speed 10723.91 samples/sec Loss 10.0382 LearningRate 0.0822 Epoch: 3 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:04,041-Speed 10929.06 samples/sec Loss 9.7396 LearningRate 0.0822 Epoch: 3 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:04,993-Speed 10772.46 samples/sec Loss 9.9953 LearningRate 0.0822 Epoch: 3 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:05,898-Speed 11319.58 samples/sec Loss 10.1611 LearningRate 0.0821 Epoch: 3 Global Step: 21300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:06,815-Speed 11170.49 samples/sec Loss 9.8208 LearningRate 0.0821 Epoch: 3 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:07,717-Speed 11364.96 samples/sec Loss 10.0607 LearningRate 0.0821 Epoch: 3 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:08,647-Speed 11018.17 samples/sec Loss 10.0837 LearningRate 0.0821 Epoch: 3 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:09,549-Speed 11371.14 samples/sec Loss 9.9285 LearningRate 0.0821 Epoch: 3 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:10,443-Speed 11450.73 samples/sec Loss 10.0429 LearningRate 0.0821 Epoch: 3 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:11,402-Speed 10685.57 samples/sec Loss 10.0230 LearningRate 0.0821 Epoch: 3 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:12,326-Speed 11090.60 samples/sec Loss 9.7929 LearningRate 0.0821 Epoch: 3 Global Step: 21370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:13,290-Speed 10628.46 samples/sec Loss 9.9719 LearningRate 0.0821 Epoch: 3 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:14,222-Speed 10999.89 samples/sec Loss 10.1402 LearningRate 0.0821 Epoch: 3 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:15,193-Speed 10554.86 samples/sec Loss 10.1469 LearningRate 0.0821 Epoch: 3 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:16,155-Speed 10649.28 samples/sec Loss 9.9222 LearningRate 0.0821 Epoch: 3 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:17,078-Speed 11095.78 samples/sec Loss 10.0468 LearningRate 0.0821 Epoch: 3 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:18,037-Speed 10692.78 samples/sec Loss 9.9176 LearningRate 0.0820 Epoch: 3 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:18,998-Speed 10666.47 samples/sec Loss 9.9922 LearningRate 0.0820 Epoch: 3 Global Step: 21440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:19,934-Speed 10938.29 samples/sec Loss 10.0211 LearningRate 0.0820 Epoch: 3 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:20,881-Speed 10819.94 samples/sec Loss 10.0475 LearningRate 0.0820 Epoch: 3 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:21,823-Speed 10879.58 samples/sec Loss 10.0769 LearningRate 0.0820 Epoch: 3 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:22,781-Speed 10697.78 samples/sec Loss 9.9613 LearningRate 0.0820 Epoch: 3 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:23,739-Speed 10697.03 samples/sec Loss 10.0464 LearningRate 0.0820 Epoch: 3 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:24,666-Speed 11060.69 samples/sec Loss 10.0327 LearningRate 0.0820 Epoch: 3 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:25,637-Speed 10551.61 samples/sec Loss 9.9981 LearningRate 0.0820 Epoch: 3 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:26,564-Speed 11051.92 samples/sec Loss 9.9096 LearningRate 0.0820 Epoch: 3 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:27,491-Speed 11057.09 samples/sec Loss 10.1732 LearningRate 0.0820 Epoch: 3 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:28,382-Speed 11509.38 samples/sec Loss 9.9498 LearningRate 0.0820 Epoch: 3 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:29,266-Speed 11593.38 samples/sec Loss 10.0421 LearningRate 0.0819 Epoch: 3 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:30,168-Speed 11356.55 samples/sec Loss 10.0452 LearningRate 0.0819 Epoch: 3 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:31,101-Speed 10985.59 samples/sec Loss 9.8840 LearningRate 0.0819 Epoch: 3 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:00:32,038-Speed 10939.53 samples/sec Loss 9.9400 LearningRate 0.0819 Epoch: 3 Global Step: 21580 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:00:32,940-Speed 11359.32 samples/sec Loss 9.9522 LearningRate 0.0819 Epoch: 3 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:33,847-Speed 11308.00 samples/sec Loss 9.9288 LearningRate 0.0819 Epoch: 3 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:34,756-Speed 11260.22 samples/sec Loss 9.8854 LearningRate 0.0819 Epoch: 3 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:35,666-Speed 11271.30 samples/sec Loss 9.9622 LearningRate 0.0819 Epoch: 3 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:36,587-Speed 11122.57 samples/sec Loss 9.9803 LearningRate 0.0819 Epoch: 3 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:37,520-Speed 10987.36 samples/sec Loss 9.9102 LearningRate 0.0819 Epoch: 3 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:38,461-Speed 10882.27 samples/sec Loss 10.0553 LearningRate 0.0819 Epoch: 3 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:39,402-Speed 10894.63 samples/sec Loss 9.8056 LearningRate 0.0819 Epoch: 3 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:40,328-Speed 11060.45 samples/sec Loss 9.7673 LearningRate 0.0819 Epoch: 3 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:41,268-Speed 10901.34 samples/sec Loss 9.8916 LearningRate 0.0818 Epoch: 3 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:42,194-Speed 11074.06 samples/sec Loss 10.1077 LearningRate 0.0818 Epoch: 3 Global Step: 21690 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:00:43,107-Speed 11211.42 samples/sec Loss 9.8641 LearningRate 0.0818 Epoch: 3 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:44,019-Speed 11247.20 samples/sec Loss 10.1431 LearningRate 0.0818 Epoch: 3 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:44,929-Speed 11253.95 samples/sec Loss 9.9255 LearningRate 0.0818 Epoch: 3 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:45,833-Speed 11333.51 samples/sec Loss 9.9794 LearningRate 0.0818 Epoch: 3 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:46,754-Speed 11125.73 samples/sec Loss 9.8814 LearningRate 0.0818 Epoch: 3 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:47,660-Speed 11315.50 samples/sec Loss 9.7577 LearningRate 0.0818 Epoch: 3 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:48,579-Speed 11157.89 samples/sec Loss 9.9446 LearningRate 0.0818 Epoch: 3 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:49,507-Speed 11039.82 samples/sec Loss 10.0644 LearningRate 0.0818 Epoch: 3 Global Step: 21770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:50,415-Speed 11278.44 samples/sec Loss 9.9279 LearningRate 0.0818 Epoch: 3 Global Step: 21780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:51,333-Speed 11169.21 samples/sec Loss 10.0276 LearningRate 0.0818 Epoch: 3 Global Step: 21790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:52,257-Speed 11091.06 samples/sec Loss 9.9114 LearningRate 0.0817 Epoch: 3 Global Step: 21800 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:00:53,197-Speed 10899.91 samples/sec Loss 10.0859 LearningRate 0.0817 Epoch: 3 Global Step: 21810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:54,133-Speed 10949.12 samples/sec Loss 9.8662 LearningRate 0.0817 Epoch: 3 Global Step: 21820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:00:55,035-Speed 11361.43 samples/sec Loss 9.9261 LearningRate 0.0817 Epoch: 3 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:55,969-Speed 10966.93 samples/sec Loss 9.8594 LearningRate 0.0817 Epoch: 3 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:56,885-Speed 11191.21 samples/sec Loss 9.9514 LearningRate 0.0817 Epoch: 3 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:57,814-Speed 11023.96 samples/sec Loss 9.8847 LearningRate 0.0817 Epoch: 3 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:58,737-Speed 11104.36 samples/sec Loss 9.9433 LearningRate 0.0817 Epoch: 3 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:00:59,658-Speed 11129.51 samples/sec Loss 9.9213 LearningRate 0.0817 Epoch: 3 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:01:00,603-Speed 10846.39 samples/sec Loss 9.8811 LearningRate 0.0817 Epoch: 3 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:01:01,560-Speed 10707.04 samples/sec Loss 9.8988 LearningRate 0.0817 Epoch: 3 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:01:02,474-Speed 11216.74 samples/sec Loss 9.8117 LearningRate 0.0817 Epoch: 3 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:01:03,397-Speed 11106.78 samples/sec Loss 10.0572 LearningRate 0.0817 Epoch: 3 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:01:04,315-Speed 11159.19 samples/sec Loss 9.8796 LearningRate 0.0816 Epoch: 3 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:05,244-Speed 11034.99 samples/sec Loss 9.8896 LearningRate 0.0816 Epoch: 3 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:06,140-Speed 11430.87 samples/sec Loss 9.9069 LearningRate 0.0816 Epoch: 3 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:07,041-Speed 11374.89 samples/sec Loss 9.7836 LearningRate 0.0816 Epoch: 3 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:07,994-Speed 10759.14 samples/sec Loss 9.9099 LearningRate 0.0816 Epoch: 3 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:08,908-Speed 11200.74 samples/sec Loss 10.0096 LearningRate 0.0816 Epoch: 3 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:09,829-Speed 11132.94 samples/sec Loss 10.0320 LearningRate 0.0816 Epoch: 3 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:01:10,708-Speed 11657.96 samples/sec Loss 9.9990 LearningRate 0.0816 Epoch: 3 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:01:31,884-[lfw][22000]XNorm: 14.944100 Training: 2022-04-11 00:01:31,885-[lfw][22000]Accuracy-Flip: 0.99233+-0.00416 Training: 2022-04-11 00:01:31,885-[lfw][22000]Accuracy-Highest: 0.99300 Training: 2022-04-11 00:01:56,437-[cfp_fp][22000]XNorm: 12.573154 Training: 2022-04-11 00:01:56,437-[cfp_fp][22000]Accuracy-Flip: 0.90800+-0.01659 Training: 2022-04-11 00:01:56,438-[cfp_fp][22000]Accuracy-Highest: 0.91014 Training: 2022-04-11 00:02:17,729-[agedb_30][22000]XNorm: 14.638771 Training: 2022-04-11 00:02:17,730-[agedb_30][22000]Accuracy-Flip: 0.94367+-0.01180 Training: 2022-04-11 00:02:17,730-[agedb_30][22000]Accuracy-Highest: 0.94400 Training: 2022-04-11 00:02:18,653-Speed 150.71 samples/sec Loss 9.9229 LearningRate 0.0816 Epoch: 3 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:19,583-Speed 11022.15 samples/sec Loss 9.9427 LearningRate 0.0816 Epoch: 3 Global Step: 22020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:20,561-Speed 10477.80 samples/sec Loss 9.8980 LearningRate 0.0816 Epoch: 3 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:21,440-Speed 11654.62 samples/sec Loss 9.9308 LearningRate 0.0816 Epoch: 3 Global Step: 22040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:22,342-Speed 11356.92 samples/sec Loss 9.8205 LearningRate 0.0816 Epoch: 3 Global Step: 22050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:23,281-Speed 10906.95 samples/sec Loss 10.0289 LearningRate 0.0815 Epoch: 3 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:24,230-Speed 10798.47 samples/sec Loss 9.8730 LearningRate 0.0815 Epoch: 3 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:25,133-Speed 11349.18 samples/sec Loss 9.6808 LearningRate 0.0815 Epoch: 3 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:26,074-Speed 10895.74 samples/sec Loss 10.0147 LearningRate 0.0815 Epoch: 3 Global Step: 22090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:27,044-Speed 10561.59 samples/sec Loss 9.7920 LearningRate 0.0815 Epoch: 3 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:27,984-Speed 10899.60 samples/sec Loss 9.9568 LearningRate 0.0815 Epoch: 3 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:28,951-Speed 10595.45 samples/sec Loss 9.7563 LearningRate 0.0815 Epoch: 3 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:29,867-Speed 11190.92 samples/sec Loss 9.8802 LearningRate 0.0815 Epoch: 3 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:30,792-Speed 11081.32 samples/sec Loss 9.9192 LearningRate 0.0815 Epoch: 3 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:31,688-Speed 11441.43 samples/sec Loss 9.8363 LearningRate 0.0815 Epoch: 3 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:32,619-Speed 10998.22 samples/sec Loss 9.8159 LearningRate 0.0815 Epoch: 3 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:33,518-Speed 11405.75 samples/sec Loss 9.8217 LearningRate 0.0815 Epoch: 3 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:34,432-Speed 11201.28 samples/sec Loss 10.0133 LearningRate 0.0814 Epoch: 3 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:35,352-Speed 11145.35 samples/sec Loss 9.9637 LearningRate 0.0814 Epoch: 3 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:36,272-Speed 11134.13 samples/sec Loss 9.7925 LearningRate 0.0814 Epoch: 3 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:37,202-Speed 11015.40 samples/sec Loss 9.8253 LearningRate 0.0814 Epoch: 3 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:38,139-Speed 10937.43 samples/sec Loss 9.9062 LearningRate 0.0814 Epoch: 3 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:39,063-Speed 11095.18 samples/sec Loss 9.7865 LearningRate 0.0814 Epoch: 3 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:40,031-Speed 10591.35 samples/sec Loss 9.7839 LearningRate 0.0814 Epoch: 3 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:40,967-Speed 10939.87 samples/sec Loss 9.8568 LearningRate 0.0814 Epoch: 3 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:41,864-Speed 11435.32 samples/sec Loss 9.8778 LearningRate 0.0814 Epoch: 3 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:42,795-Speed 11007.66 samples/sec Loss 9.8777 LearningRate 0.0814 Epoch: 3 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:43,748-Speed 10744.67 samples/sec Loss 9.7987 LearningRate 0.0814 Epoch: 3 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:44,647-Speed 11399.13 samples/sec Loss 9.9081 LearningRate 0.0814 Epoch: 3 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:45,552-Speed 11326.93 samples/sec Loss 9.9161 LearningRate 0.0814 Epoch: 3 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:46,459-Speed 11304.00 samples/sec Loss 9.7222 LearningRate 0.0813 Epoch: 3 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:47,373-Speed 11202.74 samples/sec Loss 9.8159 LearningRate 0.0813 Epoch: 3 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:48,337-Speed 10639.86 samples/sec Loss 9.9200 LearningRate 0.0813 Epoch: 3 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:49,256-Speed 11146.58 samples/sec Loss 10.1062 LearningRate 0.0813 Epoch: 3 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:50,160-Speed 11327.96 samples/sec Loss 10.0602 LearningRate 0.0813 Epoch: 3 Global Step: 22350 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:02:51,088-Speed 11048.36 samples/sec Loss 10.0584 LearningRate 0.0813 Epoch: 3 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:52,038-Speed 10792.21 samples/sec Loss 10.0638 LearningRate 0.0813 Epoch: 3 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:52,974-Speed 10944.42 samples/sec Loss 9.8012 LearningRate 0.0813 Epoch: 3 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:53,917-Speed 10870.28 samples/sec Loss 9.8781 LearningRate 0.0813 Epoch: 3 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:02:54,834-Speed 11173.39 samples/sec Loss 9.8958 LearningRate 0.0813 Epoch: 3 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:55,786-Speed 10766.04 samples/sec Loss 9.9332 LearningRate 0.0813 Epoch: 3 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:56,690-Speed 11323.54 samples/sec Loss 10.0448 LearningRate 0.0813 Epoch: 3 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:57,611-Speed 11130.37 samples/sec Loss 9.9291 LearningRate 0.0812 Epoch: 3 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:58,528-Speed 11174.60 samples/sec Loss 9.8059 LearningRate 0.0812 Epoch: 3 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:02:59,464-Speed 10946.36 samples/sec Loss 9.9755 LearningRate 0.0812 Epoch: 3 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:00,359-Speed 11455.05 samples/sec Loss 9.9131 LearningRate 0.0812 Epoch: 3 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:01,310-Speed 10769.84 samples/sec Loss 9.8626 LearningRate 0.0812 Epoch: 3 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:02,234-Speed 11087.55 samples/sec Loss 9.8899 LearningRate 0.0812 Epoch: 3 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:03,172-Speed 10926.72 samples/sec Loss 9.8802 LearningRate 0.0812 Epoch: 3 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:04,069-Speed 11417.17 samples/sec Loss 9.8784 LearningRate 0.0812 Epoch: 3 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:04,990-Speed 11126.23 samples/sec Loss 9.9468 LearningRate 0.0812 Epoch: 3 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:05,894-Speed 11338.81 samples/sec Loss 9.7781 LearningRate 0.0812 Epoch: 3 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:06,819-Speed 11076.75 samples/sec Loss 9.6911 LearningRate 0.0812 Epoch: 3 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:07,735-Speed 11188.31 samples/sec Loss 9.7399 LearningRate 0.0812 Epoch: 3 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:08,649-Speed 11212.19 samples/sec Loss 9.8035 LearningRate 0.0812 Epoch: 3 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:09,575-Speed 11065.63 samples/sec Loss 9.6771 LearningRate 0.0811 Epoch: 3 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:10,464-Speed 11539.85 samples/sec Loss 9.8743 LearningRate 0.0811 Epoch: 3 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:11,365-Speed 11364.95 samples/sec Loss 9.9904 LearningRate 0.0811 Epoch: 3 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:12,295-Speed 11023.51 samples/sec Loss 9.7832 LearningRate 0.0811 Epoch: 3 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:13,211-Speed 11184.95 samples/sec Loss 9.9832 LearningRate 0.0811 Epoch: 3 Global Step: 22600 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:03:14,149-Speed 10928.53 samples/sec Loss 9.9381 LearningRate 0.0811 Epoch: 3 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:15,052-Speed 11345.67 samples/sec Loss 10.0003 LearningRate 0.0811 Epoch: 3 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:16,013-Speed 10660.55 samples/sec Loss 9.9718 LearningRate 0.0811 Epoch: 3 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:16,923-Speed 11255.01 samples/sec Loss 9.8317 LearningRate 0.0811 Epoch: 3 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:17,835-Speed 11238.48 samples/sec Loss 9.7969 LearningRate 0.0811 Epoch: 3 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:18,752-Speed 11176.02 samples/sec Loss 9.8123 LearningRate 0.0811 Epoch: 3 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:19,676-Speed 11093.78 samples/sec Loss 9.8297 LearningRate 0.0811 Epoch: 3 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:20,643-Speed 10591.97 samples/sec Loss 9.9637 LearningRate 0.0811 Epoch: 3 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:21,572-Speed 11038.92 samples/sec Loss 9.9475 LearningRate 0.0810 Epoch: 3 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:22,492-Speed 11131.56 samples/sec Loss 9.8524 LearningRate 0.0810 Epoch: 3 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:23,408-Speed 11183.82 samples/sec Loss 9.8321 LearningRate 0.0810 Epoch: 3 Global Step: 22710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:03:24,348-Speed 10909.34 samples/sec Loss 9.8788 LearningRate 0.0810 Epoch: 3 Global Step: 22720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:03:25,270-Speed 11115.50 samples/sec Loss 10.0888 LearningRate 0.0810 Epoch: 3 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:26,288-Speed 10058.95 samples/sec Loss 9.8547 LearningRate 0.0810 Epoch: 3 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:37,334-Speed 927.15 samples/sec Loss 9.3825 LearningRate 0.0810 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:38,318-Speed 10419.54 samples/sec Loss 9.1056 LearningRate 0.0810 Epoch: 4 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:39,245-Speed 11046.87 samples/sec Loss 9.1013 LearningRate 0.0810 Epoch: 4 Global Step: 22770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:40,257-Speed 10129.97 samples/sec Loss 9.0819 LearningRate 0.0810 Epoch: 4 Global Step: 22780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:41,343-Speed 9431.68 samples/sec Loss 9.0136 LearningRate 0.0810 Epoch: 4 Global Step: 22790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:42,331-Speed 10374.36 samples/sec Loss 8.9650 LearningRate 0.0810 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:43,311-Speed 10453.57 samples/sec Loss 9.0300 LearningRate 0.0809 Epoch: 4 Global Step: 22810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:44,228-Speed 11184.38 samples/sec Loss 9.1175 LearningRate 0.0809 Epoch: 4 Global Step: 22820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:45,117-Speed 11526.15 samples/sec Loss 9.0410 LearningRate 0.0809 Epoch: 4 Global Step: 22830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:46,009-Speed 11487.09 samples/sec Loss 9.2009 LearningRate 0.0809 Epoch: 4 Global Step: 22840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:46,961-Speed 10764.27 samples/sec Loss 9.3058 LearningRate 0.0809 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:48,039-Speed 9500.48 samples/sec Loss 9.1931 LearningRate 0.0809 Epoch: 4 Global Step: 22860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:03:48,933-Speed 11463.67 samples/sec Loss 9.1939 LearningRate 0.0809 Epoch: 4 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:49,846-Speed 11231.19 samples/sec Loss 9.0987 LearningRate 0.0809 Epoch: 4 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:50,806-Speed 10669.88 samples/sec Loss 9.3244 LearningRate 0.0809 Epoch: 4 Global Step: 22890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:51,751-Speed 10848.91 samples/sec Loss 9.3920 LearningRate 0.0809 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:52,702-Speed 10774.30 samples/sec Loss 9.1040 LearningRate 0.0809 Epoch: 4 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:53,613-Speed 11250.07 samples/sec Loss 9.2623 LearningRate 0.0809 Epoch: 4 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:54,527-Speed 11217.52 samples/sec Loss 9.0903 LearningRate 0.0809 Epoch: 4 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:55,428-Speed 11366.18 samples/sec Loss 9.2509 LearningRate 0.0808 Epoch: 4 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:56,338-Speed 11261.53 samples/sec Loss 9.3235 LearningRate 0.0808 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:57,217-Speed 11666.37 samples/sec Loss 9.0388 LearningRate 0.0808 Epoch: 4 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:03:58,131-Speed 11203.02 samples/sec Loss 9.3883 LearningRate 0.0808 Epoch: 4 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:59,031-Speed 11390.63 samples/sec Loss 9.3187 LearningRate 0.0808 Epoch: 4 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:03:59,927-Speed 11431.19 samples/sec Loss 9.3441 LearningRate 0.0808 Epoch: 4 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:00,824-Speed 11429.32 samples/sec Loss 9.2062 LearningRate 0.0808 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:01,739-Speed 11201.08 samples/sec Loss 9.2714 LearningRate 0.0808 Epoch: 4 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:02,647-Speed 11277.47 samples/sec Loss 9.2142 LearningRate 0.0808 Epoch: 4 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:03,574-Speed 11061.73 samples/sec Loss 9.2210 LearningRate 0.0808 Epoch: 4 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:04,481-Speed 11296.90 samples/sec Loss 9.1896 LearningRate 0.0808 Epoch: 4 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:05,401-Speed 11141.62 samples/sec Loss 9.3266 LearningRate 0.0808 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:06,290-Speed 11521.88 samples/sec Loss 9.3160 LearningRate 0.0808 Epoch: 4 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:07,188-Speed 11409.30 samples/sec Loss 9.3906 LearningRate 0.0807 Epoch: 4 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:08,123-Speed 10960.07 samples/sec Loss 9.3027 LearningRate 0.0807 Epoch: 4 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:09,005-Speed 11619.66 samples/sec Loss 9.3672 LearningRate 0.0807 Epoch: 4 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:09,897-Speed 11496.73 samples/sec Loss 9.2065 LearningRate 0.0807 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:10,787-Speed 11506.97 samples/sec Loss 9.2642 LearningRate 0.0807 Epoch: 4 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:11,685-Speed 11416.73 samples/sec Loss 9.2129 LearningRate 0.0807 Epoch: 4 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:12,609-Speed 11084.64 samples/sec Loss 9.3817 LearningRate 0.0807 Epoch: 4 Global Step: 23130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:13,522-Speed 11227.21 samples/sec Loss 9.3823 LearningRate 0.0807 Epoch: 4 Global Step: 23140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:14,442-Speed 11133.53 samples/sec Loss 9.3537 LearningRate 0.0807 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:15,355-Speed 11229.01 samples/sec Loss 9.5328 LearningRate 0.0807 Epoch: 4 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:16,249-Speed 11456.01 samples/sec Loss 9.5470 LearningRate 0.0807 Epoch: 4 Global Step: 23170 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:04:17,169-Speed 11142.55 samples/sec Loss 9.3539 LearningRate 0.0807 Epoch: 4 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:18,100-Speed 11004.76 samples/sec Loss 9.4354 LearningRate 0.0806 Epoch: 4 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:19,025-Speed 11079.00 samples/sec Loss 9.3346 LearningRate 0.0806 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:19,923-Speed 11413.99 samples/sec Loss 9.3834 LearningRate 0.0806 Epoch: 4 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:20,813-Speed 11506.56 samples/sec Loss 9.3769 LearningRate 0.0806 Epoch: 4 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:21,718-Speed 11332.67 samples/sec Loss 9.4526 LearningRate 0.0806 Epoch: 4 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:22,645-Speed 11049.75 samples/sec Loss 9.4331 LearningRate 0.0806 Epoch: 4 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:23,560-Speed 11197.24 samples/sec Loss 9.5022 LearningRate 0.0806 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:24,456-Speed 11442.42 samples/sec Loss 9.3082 LearningRate 0.0806 Epoch: 4 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:25,397-Speed 10885.48 samples/sec Loss 9.6310 LearningRate 0.0806 Epoch: 4 Global Step: 23270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:26,327-Speed 11022.03 samples/sec Loss 9.5695 LearningRate 0.0806 Epoch: 4 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:27,245-Speed 11155.90 samples/sec Loss 9.3612 LearningRate 0.0806 Epoch: 4 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:28,185-Speed 10910.88 samples/sec Loss 9.5509 LearningRate 0.0806 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:29,123-Speed 10921.06 samples/sec Loss 9.3621 LearningRate 0.0806 Epoch: 4 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:30,052-Speed 11027.88 samples/sec Loss 9.5153 LearningRate 0.0805 Epoch: 4 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:30,949-Speed 11427.51 samples/sec Loss 9.3202 LearningRate 0.0805 Epoch: 4 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:31,880-Speed 11003.33 samples/sec Loss 9.4743 LearningRate 0.0805 Epoch: 4 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:32,787-Speed 11304.14 samples/sec Loss 9.4362 LearningRate 0.0805 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:33,697-Speed 11254.05 samples/sec Loss 9.4964 LearningRate 0.0805 Epoch: 4 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:34,572-Speed 11706.76 samples/sec Loss 9.4578 LearningRate 0.0805 Epoch: 4 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:35,466-Speed 11473.39 samples/sec Loss 9.4757 LearningRate 0.0805 Epoch: 4 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:36,369-Speed 11337.03 samples/sec Loss 9.4245 LearningRate 0.0805 Epoch: 4 Global Step: 23390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:37,272-Speed 11354.02 samples/sec Loss 9.4004 LearningRate 0.0805 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:38,176-Speed 11344.72 samples/sec Loss 9.5540 LearningRate 0.0805 Epoch: 4 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:39,113-Speed 10943.38 samples/sec Loss 9.5011 LearningRate 0.0805 Epoch: 4 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:40,047-Speed 10972.18 samples/sec Loss 9.6345 LearningRate 0.0805 Epoch: 4 Global Step: 23430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:40,959-Speed 11224.68 samples/sec Loss 9.5364 LearningRate 0.0805 Epoch: 4 Global Step: 23440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:41,863-Speed 11345.08 samples/sec Loss 9.6506 LearningRate 0.0804 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:42,781-Speed 11159.16 samples/sec Loss 9.5302 LearningRate 0.0804 Epoch: 4 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:43,695-Speed 11210.68 samples/sec Loss 9.5388 LearningRate 0.0804 Epoch: 4 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:44,600-Speed 11323.82 samples/sec Loss 9.4576 LearningRate 0.0804 Epoch: 4 Global Step: 23480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:45,495-Speed 11443.58 samples/sec Loss 9.3711 LearningRate 0.0804 Epoch: 4 Global Step: 23490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:46,402-Speed 11299.76 samples/sec Loss 9.6101 LearningRate 0.0804 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:47,285-Speed 11610.48 samples/sec Loss 9.5825 LearningRate 0.0804 Epoch: 4 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:48,207-Speed 11116.76 samples/sec Loss 9.6575 LearningRate 0.0804 Epoch: 4 Global Step: 23520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:49,096-Speed 11520.34 samples/sec Loss 9.5553 LearningRate 0.0804 Epoch: 4 Global Step: 23530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:49,993-Speed 11417.16 samples/sec Loss 9.4766 LearningRate 0.0804 Epoch: 4 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:50,924-Speed 11018.72 samples/sec Loss 9.4708 LearningRate 0.0804 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:51,817-Speed 11471.10 samples/sec Loss 9.4073 LearningRate 0.0804 Epoch: 4 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:52,724-Speed 11298.98 samples/sec Loss 9.6303 LearningRate 0.0803 Epoch: 4 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:53,643-Speed 11150.53 samples/sec Loss 9.5793 LearningRate 0.0803 Epoch: 4 Global Step: 23580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:04:54,541-Speed 11422.17 samples/sec Loss 9.6321 LearningRate 0.0803 Epoch: 4 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:55,433-Speed 11507.74 samples/sec Loss 9.6146 LearningRate 0.0803 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:56,344-Speed 11238.33 samples/sec Loss 9.5173 LearningRate 0.0803 Epoch: 4 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:57,259-Speed 11197.83 samples/sec Loss 9.3256 LearningRate 0.0803 Epoch: 4 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:58,173-Speed 11219.48 samples/sec Loss 9.5194 LearningRate 0.0803 Epoch: 4 Global Step: 23630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:04:59,082-Speed 11272.70 samples/sec Loss 9.6807 LearningRate 0.0803 Epoch: 4 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:05:00,003-Speed 11127.36 samples/sec Loss 9.4385 LearningRate 0.0803 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:05:00,948-Speed 10837.96 samples/sec Loss 9.5899 LearningRate 0.0803 Epoch: 4 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:05:01,891-Speed 10873.99 samples/sec Loss 9.5443 LearningRate 0.0803 Epoch: 4 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:05:02,824-Speed 10986.00 samples/sec Loss 9.5092 LearningRate 0.0803 Epoch: 4 Global Step: 23680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:05:03,785-Speed 10654.68 samples/sec Loss 9.5775 LearningRate 0.0803 Epoch: 4 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:04,667-Speed 11620.85 samples/sec Loss 9.6723 LearningRate 0.0802 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:05,584-Speed 11177.35 samples/sec Loss 9.5131 LearningRate 0.0802 Epoch: 4 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:06,507-Speed 11098.10 samples/sec Loss 9.5654 LearningRate 0.0802 Epoch: 4 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:07,431-Speed 11102.73 samples/sec Loss 9.5877 LearningRate 0.0802 Epoch: 4 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:08,314-Speed 11598.36 samples/sec Loss 9.6063 LearningRate 0.0802 Epoch: 4 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:09,236-Speed 11117.30 samples/sec Loss 9.4991 LearningRate 0.0802 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:10,126-Speed 11513.28 samples/sec Loss 9.5508 LearningRate 0.0802 Epoch: 4 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:11,028-Speed 11358.55 samples/sec Loss 9.6020 LearningRate 0.0802 Epoch: 4 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:11,923-Speed 11448.70 samples/sec Loss 9.5354 LearningRate 0.0802 Epoch: 4 Global Step: 23780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:12,880-Speed 10715.51 samples/sec Loss 9.4438 LearningRate 0.0802 Epoch: 4 Global Step: 23790 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:05:13,750-Speed 11778.49 samples/sec Loss 9.5841 LearningRate 0.0802 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:14,694-Speed 10850.93 samples/sec Loss 9.6018 LearningRate 0.0802 Epoch: 4 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:15,627-Speed 10982.42 samples/sec Loss 9.5675 LearningRate 0.0802 Epoch: 4 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:16,528-Speed 11371.11 samples/sec Loss 9.7216 LearningRate 0.0801 Epoch: 4 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:17,470-Speed 10877.10 samples/sec Loss 9.4617 LearningRate 0.0801 Epoch: 4 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:18,381-Speed 11252.71 samples/sec Loss 9.5494 LearningRate 0.0801 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:19,327-Speed 10829.01 samples/sec Loss 9.7393 LearningRate 0.0801 Epoch: 4 Global Step: 23860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:20,215-Speed 11540.26 samples/sec Loss 9.6664 LearningRate 0.0801 Epoch: 4 Global Step: 23870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:21,103-Speed 11543.08 samples/sec Loss 9.8424 LearningRate 0.0801 Epoch: 4 Global Step: 23880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:22,027-Speed 11091.25 samples/sec Loss 9.4879 LearningRate 0.0801 Epoch: 4 Global Step: 23890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:22,920-Speed 11466.25 samples/sec Loss 9.6558 LearningRate 0.0801 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:05:23,827-Speed 11302.81 samples/sec Loss 9.5605 LearningRate 0.0801 Epoch: 4 Global Step: 23910 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:05:24,758-Speed 11011.32 samples/sec Loss 9.5532 LearningRate 0.0801 Epoch: 4 Global Step: 23920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:25,680-Speed 11116.17 samples/sec Loss 9.4024 LearningRate 0.0801 Epoch: 4 Global Step: 23930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:26,626-Speed 10829.19 samples/sec Loss 9.7281 LearningRate 0.0801 Epoch: 4 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:27,539-Speed 11216.19 samples/sec Loss 9.6555 LearningRate 0.0800 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:28,450-Speed 11251.17 samples/sec Loss 9.6688 LearningRate 0.0800 Epoch: 4 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:29,390-Speed 10909.62 samples/sec Loss 9.9184 LearningRate 0.0800 Epoch: 4 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:30,332-Speed 10870.18 samples/sec Loss 9.5999 LearningRate 0.0800 Epoch: 4 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:31,259-Speed 11055.85 samples/sec Loss 9.5974 LearningRate 0.0800 Epoch: 4 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:32,144-Speed 11582.58 samples/sec Loss 9.5826 LearningRate 0.0800 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:05:53,567-[lfw][24000]XNorm: 14.689816 Training: 2022-04-11 00:05:53,568-[lfw][24000]Accuracy-Flip: 0.99417+-0.00443 Training: 2022-04-11 00:05:53,568-[lfw][24000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:06:18,336-[cfp_fp][24000]XNorm: 12.318632 Training: 2022-04-11 00:06:18,336-[cfp_fp][24000]Accuracy-Flip: 0.91257+-0.01542 Training: 2022-04-11 00:06:18,337-[cfp_fp][24000]Accuracy-Highest: 0.91257 Training: 2022-04-11 00:06:39,578-[agedb_30][24000]XNorm: 14.401116 Training: 2022-04-11 00:06:39,578-[agedb_30][24000]Accuracy-Flip: 0.94783+-0.01070 Training: 2022-04-11 00:06:39,579-[agedb_30][24000]Accuracy-Highest: 0.94783 Training: 2022-04-11 00:06:40,505-Speed 149.79 samples/sec Loss 9.7165 LearningRate 0.0800 Epoch: 4 Global Step: 24010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:41,501-Speed 10288.60 samples/sec Loss 9.6281 LearningRate 0.0800 Epoch: 4 Global Step: 24020 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:06:42,379-Speed 11670.42 samples/sec Loss 9.5248 LearningRate 0.0800 Epoch: 4 Global Step: 24030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:43,286-Speed 11307.34 samples/sec Loss 9.5625 LearningRate 0.0800 Epoch: 4 Global Step: 24040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:44,216-Speed 11008.75 samples/sec Loss 9.4298 LearningRate 0.0800 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:45,093-Speed 11685.40 samples/sec Loss 9.6191 LearningRate 0.0800 Epoch: 4 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:45,972-Speed 11657.10 samples/sec Loss 9.5045 LearningRate 0.0800 Epoch: 4 Global Step: 24070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:46,858-Speed 11567.06 samples/sec Loss 9.3863 LearningRate 0.0799 Epoch: 4 Global Step: 24080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:47,785-Speed 11054.58 samples/sec Loss 9.5300 LearningRate 0.0799 Epoch: 4 Global Step: 24090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:48,683-Speed 11409.13 samples/sec Loss 9.5512 LearningRate 0.0799 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:49,624-Speed 10897.03 samples/sec Loss 9.6166 LearningRate 0.0799 Epoch: 4 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:50,558-Speed 10963.13 samples/sec Loss 9.6537 LearningRate 0.0799 Epoch: 4 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:51,434-Speed 11698.39 samples/sec Loss 9.5914 LearningRate 0.0799 Epoch: 4 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:52,318-Speed 11591.66 samples/sec Loss 9.5495 LearningRate 0.0799 Epoch: 4 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:53,201-Speed 11607.55 samples/sec Loss 9.4688 LearningRate 0.0799 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:54,130-Speed 11026.93 samples/sec Loss 9.7029 LearningRate 0.0799 Epoch: 4 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:55,026-Speed 11448.30 samples/sec Loss 9.5576 LearningRate 0.0799 Epoch: 4 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:55,910-Speed 11589.29 samples/sec Loss 9.6935 LearningRate 0.0799 Epoch: 4 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:56,798-Speed 11532.66 samples/sec Loss 9.6776 LearningRate 0.0799 Epoch: 4 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:57,702-Speed 11342.12 samples/sec Loss 9.5390 LearningRate 0.0799 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:58,612-Speed 11259.48 samples/sec Loss 9.5879 LearningRate 0.0798 Epoch: 4 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:06:59,528-Speed 11185.99 samples/sec Loss 9.5753 LearningRate 0.0798 Epoch: 4 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:00,448-Speed 11136.15 samples/sec Loss 9.4879 LearningRate 0.0798 Epoch: 4 Global Step: 24230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:01,372-Speed 11089.88 samples/sec Loss 9.5971 LearningRate 0.0798 Epoch: 4 Global Step: 24240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:02,289-Speed 11174.08 samples/sec Loss 9.5937 LearningRate 0.0798 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:03,215-Speed 11062.03 samples/sec Loss 9.6004 LearningRate 0.0798 Epoch: 4 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:04,139-Speed 11093.55 samples/sec Loss 9.7508 LearningRate 0.0798 Epoch: 4 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:05,041-Speed 11353.21 samples/sec Loss 9.6033 LearningRate 0.0798 Epoch: 4 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:05,977-Speed 10958.92 samples/sec Loss 9.7079 LearningRate 0.0798 Epoch: 4 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:06,884-Speed 11292.06 samples/sec Loss 9.7458 LearningRate 0.0798 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:07,823-Speed 10918.92 samples/sec Loss 9.8038 LearningRate 0.0798 Epoch: 4 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:08,770-Speed 10818.73 samples/sec Loss 9.5528 LearningRate 0.0798 Epoch: 4 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:09,670-Speed 11392.38 samples/sec Loss 9.6470 LearningRate 0.0797 Epoch: 4 Global Step: 24330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:10,606-Speed 10947.31 samples/sec Loss 9.6850 LearningRate 0.0797 Epoch: 4 Global Step: 24340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:11,486-Speed 11637.33 samples/sec Loss 9.6122 LearningRate 0.0797 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:12,384-Speed 11415.19 samples/sec Loss 9.4621 LearningRate 0.0797 Epoch: 4 Global Step: 24360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:13,290-Speed 11305.56 samples/sec Loss 9.5776 LearningRate 0.0797 Epoch: 4 Global Step: 24370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:14,179-Speed 11524.10 samples/sec Loss 9.6796 LearningRate 0.0797 Epoch: 4 Global Step: 24380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:15,130-Speed 10781.85 samples/sec Loss 9.6297 LearningRate 0.0797 Epoch: 4 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:16,027-Speed 11414.06 samples/sec Loss 9.5990 LearningRate 0.0797 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:16,939-Speed 11241.34 samples/sec Loss 9.5885 LearningRate 0.0797 Epoch: 4 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:17,843-Speed 11334.02 samples/sec Loss 9.6593 LearningRate 0.0797 Epoch: 4 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:18,775-Speed 10994.95 samples/sec Loss 9.6012 LearningRate 0.0797 Epoch: 4 Global Step: 24430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:07:19,666-Speed 11503.82 samples/sec Loss 9.5697 LearningRate 0.0797 Epoch: 4 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:20,575-Speed 11274.16 samples/sec Loss 9.6984 LearningRate 0.0797 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:21,481-Speed 11307.84 samples/sec Loss 9.9044 LearningRate 0.0796 Epoch: 4 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:22,398-Speed 11168.96 samples/sec Loss 9.6389 LearningRate 0.0796 Epoch: 4 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:23,319-Speed 11129.53 samples/sec Loss 9.5995 LearningRate 0.0796 Epoch: 4 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:24,202-Speed 11603.32 samples/sec Loss 9.6018 LearningRate 0.0796 Epoch: 4 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:25,117-Speed 11198.96 samples/sec Loss 9.7012 LearningRate 0.0796 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:26,052-Speed 10965.96 samples/sec Loss 9.5786 LearningRate 0.0796 Epoch: 4 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:26,971-Speed 11143.18 samples/sec Loss 9.6988 LearningRate 0.0796 Epoch: 4 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:27,943-Speed 10543.07 samples/sec Loss 9.6564 LearningRate 0.0796 Epoch: 4 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:28,873-Speed 11023.13 samples/sec Loss 9.5442 LearningRate 0.0796 Epoch: 4 Global Step: 24540 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:07:29,788-Speed 11201.77 samples/sec Loss 9.7272 LearningRate 0.0796 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:07:30,742-Speed 10745.75 samples/sec Loss 9.5605 LearningRate 0.0796 Epoch: 4 Global Step: 24560 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:07:31,638-Speed 11433.93 samples/sec Loss 9.5991 LearningRate 0.0796 Epoch: 4 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:32,554-Speed 11180.11 samples/sec Loss 9.5828 LearningRate 0.0796 Epoch: 4 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:33,452-Speed 11411.59 samples/sec Loss 9.5665 LearningRate 0.0795 Epoch: 4 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:34,373-Speed 11124.57 samples/sec Loss 9.5337 LearningRate 0.0795 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:35,278-Speed 11332.19 samples/sec Loss 9.5844 LearningRate 0.0795 Epoch: 4 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:36,167-Speed 11525.80 samples/sec Loss 9.6659 LearningRate 0.0795 Epoch: 4 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:37,073-Speed 11304.85 samples/sec Loss 9.5252 LearningRate 0.0795 Epoch: 4 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:37,955-Speed 11618.46 samples/sec Loss 9.6694 LearningRate 0.0795 Epoch: 4 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:38,879-Speed 11095.19 samples/sec Loss 9.6482 LearningRate 0.0795 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:39,803-Speed 11086.50 samples/sec Loss 9.6877 LearningRate 0.0795 Epoch: 4 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:40,728-Speed 11081.97 samples/sec Loss 9.7265 LearningRate 0.0795 Epoch: 4 Global Step: 24670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:07:41,631-Speed 11337.69 samples/sec Loss 9.7206 LearningRate 0.0795 Epoch: 4 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:42,552-Speed 11126.50 samples/sec Loss 9.5194 LearningRate 0.0795 Epoch: 4 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:43,445-Speed 11476.18 samples/sec Loss 9.7059 LearningRate 0.0795 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:44,355-Speed 11265.72 samples/sec Loss 9.5412 LearningRate 0.0795 Epoch: 4 Global Step: 24710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:45,269-Speed 11204.51 samples/sec Loss 9.4404 LearningRate 0.0794 Epoch: 4 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:46,168-Speed 11400.98 samples/sec Loss 9.6624 LearningRate 0.0794 Epoch: 4 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:47,089-Speed 11126.11 samples/sec Loss 9.7152 LearningRate 0.0794 Epoch: 4 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:47,996-Speed 11299.83 samples/sec Loss 9.7261 LearningRate 0.0794 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:48,910-Speed 11217.81 samples/sec Loss 9.7194 LearningRate 0.0794 Epoch: 4 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:49,860-Speed 10786.97 samples/sec Loss 9.4872 LearningRate 0.0794 Epoch: 4 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:50,777-Speed 11169.66 samples/sec Loss 9.5780 LearningRate 0.0794 Epoch: 4 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:51,722-Speed 10840.68 samples/sec Loss 9.7072 LearningRate 0.0794 Epoch: 4 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:52,667-Speed 10844.42 samples/sec Loss 9.5586 LearningRate 0.0794 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:53,576-Speed 11279.27 samples/sec Loss 9.6890 LearningRate 0.0794 Epoch: 4 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:54,505-Speed 11025.54 samples/sec Loss 9.5741 LearningRate 0.0794 Epoch: 4 Global Step: 24820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:55,382-Speed 11682.98 samples/sec Loss 9.5422 LearningRate 0.0794 Epoch: 4 Global Step: 24830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:56,347-Speed 10627.79 samples/sec Loss 9.4606 LearningRate 0.0793 Epoch: 4 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:57,236-Speed 11517.62 samples/sec Loss 9.6096 LearningRate 0.0793 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:58,134-Speed 11411.88 samples/sec Loss 9.6852 LearningRate 0.0793 Epoch: 4 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:07:59,052-Speed 11167.00 samples/sec Loss 9.5877 LearningRate 0.0793 Epoch: 4 Global Step: 24870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:07:59,936-Speed 11582.98 samples/sec Loss 9.6367 LearningRate 0.0793 Epoch: 4 Global Step: 24880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:00,844-Speed 11285.40 samples/sec Loss 9.5509 LearningRate 0.0793 Epoch: 4 Global Step: 24890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:01,742-Speed 11411.26 samples/sec Loss 9.4924 LearningRate 0.0793 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:02,647-Speed 11326.39 samples/sec Loss 9.5527 LearningRate 0.0793 Epoch: 4 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:03,570-Speed 11105.67 samples/sec Loss 9.6664 LearningRate 0.0793 Epoch: 4 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:04,498-Speed 11044.93 samples/sec Loss 9.5328 LearningRate 0.0793 Epoch: 4 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:05,433-Speed 10961.91 samples/sec Loss 9.4683 LearningRate 0.0793 Epoch: 4 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:06,353-Speed 11129.99 samples/sec Loss 9.3307 LearningRate 0.0793 Epoch: 4 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:07,256-Speed 11347.87 samples/sec Loss 9.6250 LearningRate 0.0793 Epoch: 4 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:08,210-Speed 10744.25 samples/sec Loss 9.6428 LearningRate 0.0792 Epoch: 4 Global Step: 24970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:09,099-Speed 11527.42 samples/sec Loss 9.5816 LearningRate 0.0792 Epoch: 4 Global Step: 24980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:09,967-Speed 11811.18 samples/sec Loss 9.7103 LearningRate 0.0792 Epoch: 4 Global Step: 24990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:10,839-Speed 11757.50 samples/sec Loss 9.5127 LearningRate 0.0792 Epoch: 4 Global Step: 25000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:11,748-Speed 11265.50 samples/sec Loss 9.6145 LearningRate 0.0792 Epoch: 4 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:12,665-Speed 11173.59 samples/sec Loss 9.6111 LearningRate 0.0792 Epoch: 4 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:13,613-Speed 10811.50 samples/sec Loss 9.6094 LearningRate 0.0792 Epoch: 4 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:14,534-Speed 11126.90 samples/sec Loss 9.5553 LearningRate 0.0792 Epoch: 4 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:15,461-Speed 11056.19 samples/sec Loss 9.5144 LearningRate 0.0792 Epoch: 4 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:16,417-Speed 10716.45 samples/sec Loss 9.5236 LearningRate 0.0792 Epoch: 4 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:17,323-Speed 11312.51 samples/sec Loss 9.4765 LearningRate 0.0792 Epoch: 4 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:18,265-Speed 10875.87 samples/sec Loss 9.6385 LearningRate 0.0792 Epoch: 4 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:19,212-Speed 10828.10 samples/sec Loss 9.5484 LearningRate 0.0792 Epoch: 4 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:20,141-Speed 11030.56 samples/sec Loss 9.5434 LearningRate 0.0791 Epoch: 4 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:21,022-Speed 11629.64 samples/sec Loss 9.5337 LearningRate 0.0791 Epoch: 4 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:21,951-Speed 11021.23 samples/sec Loss 9.5186 LearningRate 0.0791 Epoch: 4 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:22,835-Speed 11591.55 samples/sec Loss 9.4981 LearningRate 0.0791 Epoch: 4 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:23,743-Speed 11292.29 samples/sec Loss 9.3513 LearningRate 0.0791 Epoch: 4 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:24,650-Speed 11300.54 samples/sec Loss 9.6423 LearningRate 0.0791 Epoch: 4 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:25,574-Speed 11090.96 samples/sec Loss 9.7035 LearningRate 0.0791 Epoch: 4 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:26,466-Speed 11478.31 samples/sec Loss 9.4995 LearningRate 0.0791 Epoch: 4 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:27,368-Speed 11358.40 samples/sec Loss 9.6306 LearningRate 0.0791 Epoch: 4 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:28,268-Speed 11389.16 samples/sec Loss 9.5454 LearningRate 0.0791 Epoch: 4 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:29,183-Speed 11219.22 samples/sec Loss 9.3899 LearningRate 0.0791 Epoch: 4 Global Step: 25200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:30,099-Speed 11184.40 samples/sec Loss 9.5503 LearningRate 0.0791 Epoch: 4 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:30,993-Speed 11460.81 samples/sec Loss 9.5227 LearningRate 0.0791 Epoch: 4 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:31,884-Speed 11511.65 samples/sec Loss 9.5814 LearningRate 0.0790 Epoch: 4 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:32,779-Speed 11450.01 samples/sec Loss 9.5031 LearningRate 0.0790 Epoch: 4 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:33,669-Speed 11513.53 samples/sec Loss 9.4676 LearningRate 0.0790 Epoch: 4 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:34,551-Speed 11617.83 samples/sec Loss 9.5393 LearningRate 0.0790 Epoch: 4 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:35,432-Speed 11631.44 samples/sec Loss 9.6809 LearningRate 0.0790 Epoch: 4 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:36,308-Speed 11689.64 samples/sec Loss 9.6948 LearningRate 0.0790 Epoch: 4 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:37,218-Speed 11267.57 samples/sec Loss 9.6331 LearningRate 0.0790 Epoch: 4 Global Step: 25290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:38,119-Speed 11375.08 samples/sec Loss 9.5335 LearningRate 0.0790 Epoch: 4 Global Step: 25300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:39,047-Speed 11037.31 samples/sec Loss 9.6091 LearningRate 0.0790 Epoch: 4 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:40,003-Speed 10722.57 samples/sec Loss 9.5014 LearningRate 0.0790 Epoch: 4 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:40,931-Speed 11041.32 samples/sec Loss 9.5137 LearningRate 0.0790 Epoch: 4 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:41,863-Speed 10992.20 samples/sec Loss 9.3196 LearningRate 0.0790 Epoch: 4 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:42,831-Speed 10587.96 samples/sec Loss 9.5712 LearningRate 0.0790 Epoch: 4 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:43,714-Speed 11600.36 samples/sec Loss 9.5352 LearningRate 0.0789 Epoch: 4 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:44,607-Speed 11475.28 samples/sec Loss 9.5352 LearningRate 0.0789 Epoch: 4 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:45,477-Speed 11773.68 samples/sec Loss 9.4631 LearningRate 0.0789 Epoch: 4 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:46,377-Speed 11386.31 samples/sec Loss 9.6016 LearningRate 0.0789 Epoch: 4 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:47,273-Speed 11446.37 samples/sec Loss 9.3983 LearningRate 0.0789 Epoch: 4 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:48,174-Speed 11370.27 samples/sec Loss 9.4942 LearningRate 0.0789 Epoch: 4 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:49,072-Speed 11412.75 samples/sec Loss 9.5447 LearningRate 0.0789 Epoch: 4 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:49,960-Speed 11532.99 samples/sec Loss 9.6276 LearningRate 0.0789 Epoch: 4 Global Step: 25430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:50,850-Speed 11519.30 samples/sec Loss 9.5418 LearningRate 0.0789 Epoch: 4 Global Step: 25440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:51,745-Speed 11440.75 samples/sec Loss 9.5022 LearningRate 0.0789 Epoch: 4 Global Step: 25450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:52,693-Speed 10818.93 samples/sec Loss 9.6983 LearningRate 0.0789 Epoch: 4 Global Step: 25460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:53,600-Speed 11297.07 samples/sec Loss 9.5419 LearningRate 0.0789 Epoch: 4 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:08:54,534-Speed 10973.28 samples/sec Loss 9.5978 LearningRate 0.0788 Epoch: 4 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:55,427-Speed 11479.89 samples/sec Loss 9.4848 LearningRate 0.0788 Epoch: 4 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:56,318-Speed 11493.21 samples/sec Loss 9.4145 LearningRate 0.0788 Epoch: 4 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:57,205-Speed 11554.72 samples/sec Loss 9.6510 LearningRate 0.0788 Epoch: 4 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:58,099-Speed 11462.39 samples/sec Loss 9.3592 LearningRate 0.0788 Epoch: 4 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:58,980-Speed 11632.13 samples/sec Loss 9.6104 LearningRate 0.0788 Epoch: 4 Global Step: 25530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:08:59,881-Speed 11371.62 samples/sec Loss 9.6481 LearningRate 0.0788 Epoch: 4 Global Step: 25540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:00,815-Speed 10966.74 samples/sec Loss 9.2969 LearningRate 0.0788 Epoch: 4 Global Step: 25550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:01,761-Speed 10832.75 samples/sec Loss 9.5473 LearningRate 0.0788 Epoch: 4 Global Step: 25560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:02,671-Speed 11257.78 samples/sec Loss 9.4667 LearningRate 0.0788 Epoch: 4 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:03,584-Speed 11231.96 samples/sec Loss 9.3012 LearningRate 0.0788 Epoch: 4 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:04,474-Speed 11507.56 samples/sec Loss 9.7445 LearningRate 0.0788 Epoch: 4 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:05,365-Speed 11501.35 samples/sec Loss 9.4042 LearningRate 0.0788 Epoch: 4 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:06,257-Speed 11481.75 samples/sec Loss 9.5046 LearningRate 0.0787 Epoch: 4 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:07,182-Speed 11079.33 samples/sec Loss 9.5389 LearningRate 0.0787 Epoch: 4 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:08,134-Speed 10771.18 samples/sec Loss 9.6760 LearningRate 0.0787 Epoch: 4 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:09,057-Speed 11095.31 samples/sec Loss 9.5405 LearningRate 0.0787 Epoch: 4 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:09,939-Speed 11619.31 samples/sec Loss 9.4578 LearningRate 0.0787 Epoch: 4 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:10,848-Speed 11279.14 samples/sec Loss 9.6330 LearningRate 0.0787 Epoch: 4 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:11,775-Speed 11055.60 samples/sec Loss 9.4983 LearningRate 0.0787 Epoch: 4 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:12,998-Speed 8373.78 samples/sec Loss 9.5595 LearningRate 0.0787 Epoch: 4 Global Step: 25680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:09:13,911-Speed 11231.00 samples/sec Loss 9.5771 LearningRate 0.0787 Epoch: 4 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:14,805-Speed 11465.87 samples/sec Loss 9.6372 LearningRate 0.0787 Epoch: 4 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:15,750-Speed 10843.72 samples/sec Loss 9.4491 LearningRate 0.0787 Epoch: 4 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:16,677-Speed 11052.11 samples/sec Loss 9.3921 LearningRate 0.0787 Epoch: 4 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:17,608-Speed 10998.01 samples/sec Loss 9.5135 LearningRate 0.0787 Epoch: 4 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:18,524-Speed 11195.56 samples/sec Loss 9.6520 LearningRate 0.0786 Epoch: 4 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:19,440-Speed 11180.62 samples/sec Loss 9.6856 LearningRate 0.0786 Epoch: 4 Global Step: 25750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:20,372-Speed 10997.17 samples/sec Loss 9.5021 LearningRate 0.0786 Epoch: 4 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:21,319-Speed 10821.49 samples/sec Loss 9.4924 LearningRate 0.0786 Epoch: 4 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:22,235-Speed 11185.80 samples/sec Loss 9.5896 LearningRate 0.0786 Epoch: 4 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:23,129-Speed 11467.59 samples/sec Loss 9.3475 LearningRate 0.0786 Epoch: 4 Global Step: 25790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:24,038-Speed 11272.55 samples/sec Loss 9.5750 LearningRate 0.0786 Epoch: 4 Global Step: 25800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:24,944-Speed 11303.51 samples/sec Loss 9.7373 LearningRate 0.0786 Epoch: 4 Global Step: 25810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:25,883-Speed 10920.97 samples/sec Loss 9.5675 LearningRate 0.0786 Epoch: 4 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:26,771-Speed 11537.50 samples/sec Loss 9.5475 LearningRate 0.0786 Epoch: 4 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:27,721-Speed 10780.24 samples/sec Loss 9.6193 LearningRate 0.0786 Epoch: 4 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:28,686-Speed 10620.65 samples/sec Loss 9.4020 LearningRate 0.0786 Epoch: 4 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:29,641-Speed 10734.86 samples/sec Loss 9.6145 LearningRate 0.0786 Epoch: 4 Global Step: 25860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:30,587-Speed 10833.89 samples/sec Loss 9.5111 LearningRate 0.0785 Epoch: 4 Global Step: 25870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:31,561-Speed 10511.10 samples/sec Loss 9.7133 LearningRate 0.0785 Epoch: 4 Global Step: 25880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:32,507-Speed 10839.22 samples/sec Loss 9.6226 LearningRate 0.0785 Epoch: 4 Global Step: 25890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:33,380-Speed 11741.51 samples/sec Loss 9.5209 LearningRate 0.0785 Epoch: 4 Global Step: 25900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:34,286-Speed 11300.32 samples/sec Loss 9.5522 LearningRate 0.0785 Epoch: 4 Global Step: 25910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:35,166-Speed 11649.04 samples/sec Loss 9.5459 LearningRate 0.0785 Epoch: 4 Global Step: 25920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:36,052-Speed 11573.08 samples/sec Loss 9.6306 LearningRate 0.0785 Epoch: 4 Global Step: 25930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:36,958-Speed 11305.69 samples/sec Loss 9.7836 LearningRate 0.0785 Epoch: 4 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:37,880-Speed 11122.18 samples/sec Loss 9.4298 LearningRate 0.0785 Epoch: 4 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:09:38,778-Speed 11413.02 samples/sec Loss 9.5456 LearningRate 0.0785 Epoch: 4 Global Step: 25960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:39,674-Speed 11438.59 samples/sec Loss 9.6296 LearningRate 0.0785 Epoch: 4 Global Step: 25970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:40,575-Speed 11365.35 samples/sec Loss 9.5772 LearningRate 0.0785 Epoch: 4 Global Step: 25980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:41,503-Speed 11048.81 samples/sec Loss 9.5403 LearningRate 0.0785 Epoch: 4 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:09:42,401-Speed 11408.45 samples/sec Loss 9.5786 LearningRate 0.0784 Epoch: 4 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:10:03,906-[lfw][26000]XNorm: 14.177654 Training: 2022-04-11 00:10:03,907-[lfw][26000]Accuracy-Flip: 0.99233+-0.00351 Training: 2022-04-11 00:10:03,907-[lfw][26000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:10:28,631-[cfp_fp][26000]XNorm: 11.841624 Training: 2022-04-11 00:10:28,631-[cfp_fp][26000]Accuracy-Flip: 0.91714+-0.01506 Training: 2022-04-11 00:10:28,632-[cfp_fp][26000]Accuracy-Highest: 0.91714 Training: 2022-04-11 00:10:49,913-[agedb_30][26000]XNorm: 13.840775 Training: 2022-04-11 00:10:49,914-[agedb_30][26000]Accuracy-Flip: 0.95083+-0.01263 Training: 2022-04-11 00:10:49,914-[agedb_30][26000]Accuracy-Highest: 0.95083 Training: 2022-04-11 00:10:50,805-Speed 149.70 samples/sec Loss 9.5317 LearningRate 0.0784 Epoch: 4 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:51,701-Speed 11437.22 samples/sec Loss 9.5064 LearningRate 0.0784 Epoch: 4 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:52,665-Speed 10632.94 samples/sec Loss 9.5383 LearningRate 0.0784 Epoch: 4 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:53,570-Speed 11316.36 samples/sec Loss 9.6198 LearningRate 0.0784 Epoch: 4 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:54,466-Speed 11442.20 samples/sec Loss 9.5090 LearningRate 0.0784 Epoch: 4 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:55,379-Speed 11226.59 samples/sec Loss 9.3539 LearningRate 0.0784 Epoch: 4 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:56,276-Speed 11421.12 samples/sec Loss 9.3086 LearningRate 0.0784 Epoch: 4 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:57,178-Speed 11358.22 samples/sec Loss 9.4568 LearningRate 0.0784 Epoch: 4 Global Step: 26080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:58,112-Speed 10971.40 samples/sec Loss 9.4909 LearningRate 0.0784 Epoch: 4 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:10:59,104-Speed 10329.14 samples/sec Loss 9.4335 LearningRate 0.0784 Epoch: 4 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:00,016-Speed 11241.05 samples/sec Loss 9.6206 LearningRate 0.0784 Epoch: 4 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:00,923-Speed 11293.69 samples/sec Loss 9.5544 LearningRate 0.0784 Epoch: 4 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:01,852-Speed 11028.84 samples/sec Loss 9.6488 LearningRate 0.0783 Epoch: 4 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:02,753-Speed 11374.54 samples/sec Loss 9.4862 LearningRate 0.0783 Epoch: 4 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:03,656-Speed 11344.77 samples/sec Loss 9.5228 LearningRate 0.0783 Epoch: 4 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:04,581-Speed 11079.38 samples/sec Loss 9.5787 LearningRate 0.0783 Epoch: 4 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:05,504-Speed 11098.08 samples/sec Loss 9.4500 LearningRate 0.0783 Epoch: 4 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:06,434-Speed 11018.53 samples/sec Loss 9.5538 LearningRate 0.0783 Epoch: 4 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:07,335-Speed 11371.77 samples/sec Loss 9.5538 LearningRate 0.0783 Epoch: 4 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:08,252-Speed 11178.81 samples/sec Loss 9.4611 LearningRate 0.0783 Epoch: 4 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:09,165-Speed 11220.56 samples/sec Loss 9.4800 LearningRate 0.0783 Epoch: 4 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:10,052-Speed 11552.83 samples/sec Loss 9.6259 LearningRate 0.0783 Epoch: 4 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:10,943-Speed 11507.57 samples/sec Loss 9.4137 LearningRate 0.0783 Epoch: 4 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:11,875-Speed 10992.65 samples/sec Loss 9.4118 LearningRate 0.0783 Epoch: 4 Global Step: 26240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:12,766-Speed 11507.84 samples/sec Loss 9.4407 LearningRate 0.0782 Epoch: 4 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:13,689-Speed 11100.56 samples/sec Loss 9.5330 LearningRate 0.0782 Epoch: 4 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:14,580-Speed 11500.84 samples/sec Loss 9.4462 LearningRate 0.0782 Epoch: 4 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:15,481-Speed 11369.88 samples/sec Loss 9.3295 LearningRate 0.0782 Epoch: 4 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:16,375-Speed 11461.97 samples/sec Loss 9.3879 LearningRate 0.0782 Epoch: 4 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:17,292-Speed 11177.82 samples/sec Loss 9.4580 LearningRate 0.0782 Epoch: 4 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:18,203-Speed 11237.54 samples/sec Loss 9.2850 LearningRate 0.0782 Epoch: 4 Global Step: 26310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:19,121-Speed 11169.21 samples/sec Loss 9.3799 LearningRate 0.0782 Epoch: 4 Global Step: 26320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:20,017-Speed 11438.15 samples/sec Loss 9.5239 LearningRate 0.0782 Epoch: 4 Global Step: 26330 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:11:20,925-Speed 11279.65 samples/sec Loss 9.6910 LearningRate 0.0782 Epoch: 4 Global Step: 26340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:21,848-Speed 11105.13 samples/sec Loss 9.2029 LearningRate 0.0782 Epoch: 4 Global Step: 26350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:22,730-Speed 11620.31 samples/sec Loss 9.2631 LearningRate 0.0782 Epoch: 4 Global Step: 26360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:23,623-Speed 11463.68 samples/sec Loss 9.6328 LearningRate 0.0782 Epoch: 4 Global Step: 26370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:24,550-Speed 11065.12 samples/sec Loss 9.4030 LearningRate 0.0781 Epoch: 4 Global Step: 26380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:25,510-Speed 10668.70 samples/sec Loss 9.4169 LearningRate 0.0781 Epoch: 4 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:26,475-Speed 10617.26 samples/sec Loss 9.5069 LearningRate 0.0781 Epoch: 4 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:27,430-Speed 10733.00 samples/sec Loss 9.2799 LearningRate 0.0781 Epoch: 4 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:28,333-Speed 11343.41 samples/sec Loss 9.5143 LearningRate 0.0781 Epoch: 4 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:29,292-Speed 10682.10 samples/sec Loss 9.5941 LearningRate 0.0781 Epoch: 4 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:30,236-Speed 10859.59 samples/sec Loss 9.4428 LearningRate 0.0781 Epoch: 4 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:31,183-Speed 10818.17 samples/sec Loss 9.4607 LearningRate 0.0781 Epoch: 4 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:32,105-Speed 11114.54 samples/sec Loss 9.3336 LearningRate 0.0781 Epoch: 4 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:32,997-Speed 11490.73 samples/sec Loss 9.4882 LearningRate 0.0781 Epoch: 4 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:33,916-Speed 11153.22 samples/sec Loss 9.5077 LearningRate 0.0781 Epoch: 4 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:34,862-Speed 10833.71 samples/sec Loss 9.4683 LearningRate 0.0781 Epoch: 4 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:35,750-Speed 11545.14 samples/sec Loss 9.1979 LearningRate 0.0781 Epoch: 4 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:36,693-Speed 10860.68 samples/sec Loss 9.3803 LearningRate 0.0780 Epoch: 4 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:37,602-Speed 11267.70 samples/sec Loss 9.4052 LearningRate 0.0780 Epoch: 4 Global Step: 26520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:38,511-Speed 11283.15 samples/sec Loss 9.3262 LearningRate 0.0780 Epoch: 4 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:39,440-Speed 11026.16 samples/sec Loss 9.3372 LearningRate 0.0780 Epoch: 4 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:40,335-Speed 11451.54 samples/sec Loss 9.2885 LearningRate 0.0780 Epoch: 4 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:41,237-Speed 11356.94 samples/sec Loss 9.4063 LearningRate 0.0780 Epoch: 4 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:42,183-Speed 10832.27 samples/sec Loss 9.5249 LearningRate 0.0780 Epoch: 4 Global Step: 26570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:43,093-Speed 11257.24 samples/sec Loss 9.5213 LearningRate 0.0780 Epoch: 4 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:44,018-Speed 11084.73 samples/sec Loss 9.4692 LearningRate 0.0780 Epoch: 4 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:44,911-Speed 11477.83 samples/sec Loss 9.4439 LearningRate 0.0780 Epoch: 4 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:45,798-Speed 11554.41 samples/sec Loss 9.5777 LearningRate 0.0780 Epoch: 4 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:46,706-Speed 11280.47 samples/sec Loss 9.4964 LearningRate 0.0780 Epoch: 4 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:47,601-Speed 11450.09 samples/sec Loss 9.5009 LearningRate 0.0780 Epoch: 4 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:48,514-Speed 11238.76 samples/sec Loss 9.4225 LearningRate 0.0779 Epoch: 4 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:49,440-Speed 11067.06 samples/sec Loss 9.5663 LearningRate 0.0779 Epoch: 4 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:11:50,411-Speed 10551.80 samples/sec Loss 9.4978 LearningRate 0.0779 Epoch: 4 Global Step: 26660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:51,290-Speed 11650.54 samples/sec Loss 9.5198 LearningRate 0.0779 Epoch: 4 Global Step: 26670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:52,185-Speed 11452.83 samples/sec Loss 9.6062 LearningRate 0.0779 Epoch: 4 Global Step: 26680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:53,106-Speed 11122.72 samples/sec Loss 9.4830 LearningRate 0.0779 Epoch: 4 Global Step: 26690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:53,989-Speed 11616.36 samples/sec Loss 9.4432 LearningRate 0.0779 Epoch: 4 Global Step: 26700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:54,906-Speed 11169.06 samples/sec Loss 9.3737 LearningRate 0.0779 Epoch: 4 Global Step: 26710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:55,876-Speed 10563.51 samples/sec Loss 9.5406 LearningRate 0.0779 Epoch: 4 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:56,800-Speed 11095.50 samples/sec Loss 9.4250 LearningRate 0.0779 Epoch: 4 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:57,709-Speed 11270.70 samples/sec Loss 9.4079 LearningRate 0.0779 Epoch: 4 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:58,631-Speed 11111.35 samples/sec Loss 9.4994 LearningRate 0.0779 Epoch: 4 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:11:59,502-Speed 11777.30 samples/sec Loss 9.5234 LearningRate 0.0779 Epoch: 4 Global Step: 26760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:00,406-Speed 11335.34 samples/sec Loss 9.3174 LearningRate 0.0778 Epoch: 4 Global Step: 26770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:01,323-Speed 11170.95 samples/sec Loss 9.3683 LearningRate 0.0778 Epoch: 4 Global Step: 26780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:02,248-Speed 11081.98 samples/sec Loss 9.4404 LearningRate 0.0778 Epoch: 4 Global Step: 26790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:03,132-Speed 11586.08 samples/sec Loss 9.4525 LearningRate 0.0778 Epoch: 4 Global Step: 26800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:04,025-Speed 11478.24 samples/sec Loss 9.5421 LearningRate 0.0778 Epoch: 4 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:04,951-Speed 11058.99 samples/sec Loss 9.6726 LearningRate 0.0778 Epoch: 4 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:05,873-Speed 11122.20 samples/sec Loss 9.4094 LearningRate 0.0778 Epoch: 4 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:06,783-Speed 11252.99 samples/sec Loss 9.4647 LearningRate 0.0778 Epoch: 4 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:07,693-Speed 11268.62 samples/sec Loss 9.5207 LearningRate 0.0778 Epoch: 4 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:08,648-Speed 10726.83 samples/sec Loss 9.3926 LearningRate 0.0778 Epoch: 4 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:09,545-Speed 11422.17 samples/sec Loss 9.5094 LearningRate 0.0778 Epoch: 4 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:10,442-Speed 11426.77 samples/sec Loss 9.5883 LearningRate 0.0778 Epoch: 4 Global Step: 26880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:11,350-Speed 11282.54 samples/sec Loss 9.3953 LearningRate 0.0778 Epoch: 4 Global Step: 26890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:12,246-Speed 11434.67 samples/sec Loss 9.5597 LearningRate 0.0777 Epoch: 4 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:13,156-Speed 11258.66 samples/sec Loss 9.5432 LearningRate 0.0777 Epoch: 4 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:14,074-Speed 11164.63 samples/sec Loss 9.4084 LearningRate 0.0777 Epoch: 4 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:15,022-Speed 10828.81 samples/sec Loss 9.4864 LearningRate 0.0777 Epoch: 4 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:15,950-Speed 11036.88 samples/sec Loss 9.4848 LearningRate 0.0777 Epoch: 4 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:16,900-Speed 10798.07 samples/sec Loss 9.4025 LearningRate 0.0777 Epoch: 4 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:17,812-Speed 11231.59 samples/sec Loss 9.5956 LearningRate 0.0777 Epoch: 4 Global Step: 26960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:12:18,719-Speed 11304.32 samples/sec Loss 9.6781 LearningRate 0.0777 Epoch: 4 Global Step: 26970 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:12:19,623-Speed 11327.76 samples/sec Loss 9.4110 LearningRate 0.0777 Epoch: 4 Global Step: 26980 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:12:20,522-Speed 11405.99 samples/sec Loss 9.4934 LearningRate 0.0777 Epoch: 4 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:21,418-Speed 11426.17 samples/sec Loss 9.5682 LearningRate 0.0777 Epoch: 4 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:22,309-Speed 11507.48 samples/sec Loss 9.3974 LearningRate 0.0777 Epoch: 4 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:23,260-Speed 10774.31 samples/sec Loss 9.3029 LearningRate 0.0777 Epoch: 4 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:24,163-Speed 11343.14 samples/sec Loss 9.4737 LearningRate 0.0776 Epoch: 4 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:25,074-Speed 11249.17 samples/sec Loss 9.5081 LearningRate 0.0776 Epoch: 4 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:25,970-Speed 11430.65 samples/sec Loss 9.3544 LearningRate 0.0776 Epoch: 4 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:26,874-Speed 11340.22 samples/sec Loss 9.3539 LearningRate 0.0776 Epoch: 4 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:27,765-Speed 11493.46 samples/sec Loss 9.4099 LearningRate 0.0776 Epoch: 4 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:28,667-Speed 11368.57 samples/sec Loss 9.4438 LearningRate 0.0776 Epoch: 4 Global Step: 27080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:29,570-Speed 11343.44 samples/sec Loss 9.3730 LearningRate 0.0776 Epoch: 4 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:30,507-Speed 10937.89 samples/sec Loss 9.4898 LearningRate 0.0776 Epoch: 4 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:31,428-Speed 11123.31 samples/sec Loss 9.4807 LearningRate 0.0776 Epoch: 4 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:32,345-Speed 11199.02 samples/sec Loss 9.4898 LearningRate 0.0776 Epoch: 4 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:33,250-Speed 11319.03 samples/sec Loss 9.5739 LearningRate 0.0776 Epoch: 4 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:34,184-Speed 10974.75 samples/sec Loss 9.3644 LearningRate 0.0776 Epoch: 4 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:35,083-Speed 11395.23 samples/sec Loss 9.4836 LearningRate 0.0776 Epoch: 4 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:35,987-Speed 11340.82 samples/sec Loss 9.5449 LearningRate 0.0775 Epoch: 4 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:36,894-Speed 11294.23 samples/sec Loss 9.3150 LearningRate 0.0775 Epoch: 4 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:37,804-Speed 11258.40 samples/sec Loss 9.5472 LearningRate 0.0775 Epoch: 4 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:38,753-Speed 10798.67 samples/sec Loss 9.3608 LearningRate 0.0775 Epoch: 4 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:39,654-Speed 11377.13 samples/sec Loss 9.3857 LearningRate 0.0775 Epoch: 4 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:40,549-Speed 11450.55 samples/sec Loss 9.4434 LearningRate 0.0775 Epoch: 4 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:41,464-Speed 11198.61 samples/sec Loss 9.4399 LearningRate 0.0775 Epoch: 4 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:42,372-Speed 11282.31 samples/sec Loss 9.5222 LearningRate 0.0775 Epoch: 4 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:43,305-Speed 10978.42 samples/sec Loss 9.4508 LearningRate 0.0775 Epoch: 4 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:44,225-Speed 11142.69 samples/sec Loss 9.2941 LearningRate 0.0775 Epoch: 4 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:45,127-Speed 11355.73 samples/sec Loss 9.2890 LearningRate 0.0775 Epoch: 4 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:46,020-Speed 11477.99 samples/sec Loss 9.4158 LearningRate 0.0775 Epoch: 4 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:46,910-Speed 11513.90 samples/sec Loss 9.2655 LearningRate 0.0774 Epoch: 4 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:47,818-Speed 11283.70 samples/sec Loss 9.5049 LearningRate 0.0774 Epoch: 4 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:48,711-Speed 11487.51 samples/sec Loss 9.6061 LearningRate 0.0774 Epoch: 4 Global Step: 27300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:49,619-Speed 11280.10 samples/sec Loss 9.5973 LearningRate 0.0774 Epoch: 4 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:50,532-Speed 11220.50 samples/sec Loss 9.3397 LearningRate 0.0774 Epoch: 4 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:51,429-Speed 11425.01 samples/sec Loss 9.4894 LearningRate 0.0774 Epoch: 4 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:52,343-Speed 11208.47 samples/sec Loss 9.2921 LearningRate 0.0774 Epoch: 4 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:53,249-Speed 11316.75 samples/sec Loss 9.4559 LearningRate 0.0774 Epoch: 4 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:54,167-Speed 11158.71 samples/sec Loss 9.3019 LearningRate 0.0774 Epoch: 4 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:55,077-Speed 11252.70 samples/sec Loss 9.1937 LearningRate 0.0774 Epoch: 4 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:55,998-Speed 11126.90 samples/sec Loss 9.4049 LearningRate 0.0774 Epoch: 4 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:12:56,926-Speed 11046.39 samples/sec Loss 9.5605 LearningRate 0.0774 Epoch: 4 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:57,867-Speed 10885.07 samples/sec Loss 9.3469 LearningRate 0.0774 Epoch: 4 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:58,744-Speed 11685.00 samples/sec Loss 9.4114 LearningRate 0.0773 Epoch: 4 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:12:59,678-Speed 10977.04 samples/sec Loss 9.3325 LearningRate 0.0773 Epoch: 4 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:00,594-Speed 11189.43 samples/sec Loss 9.3504 LearningRate 0.0773 Epoch: 4 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:01,526-Speed 10986.00 samples/sec Loss 9.3451 LearningRate 0.0773 Epoch: 4 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:02,425-Speed 11409.12 samples/sec Loss 9.3306 LearningRate 0.0773 Epoch: 4 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:03,330-Speed 11315.77 samples/sec Loss 9.3946 LearningRate 0.0773 Epoch: 4 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:04,235-Speed 11330.79 samples/sec Loss 9.3372 LearningRate 0.0773 Epoch: 4 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:05,136-Speed 11375.39 samples/sec Loss 9.4855 LearningRate 0.0773 Epoch: 4 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:06,016-Speed 11648.25 samples/sec Loss 9.3232 LearningRate 0.0773 Epoch: 4 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:06,921-Speed 11313.54 samples/sec Loss 9.3767 LearningRate 0.0773 Epoch: 4 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:07,790-Speed 11801.24 samples/sec Loss 9.4183 LearningRate 0.0773 Epoch: 4 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:08,687-Speed 11423.07 samples/sec Loss 9.5398 LearningRate 0.0773 Epoch: 4 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:09,572-Speed 11577.33 samples/sec Loss 9.1803 LearningRate 0.0773 Epoch: 4 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:10,473-Speed 11372.78 samples/sec Loss 9.4344 LearningRate 0.0772 Epoch: 4 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:11,371-Speed 11411.40 samples/sec Loss 9.4164 LearningRate 0.0772 Epoch: 4 Global Step: 27550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:12,288-Speed 11177.45 samples/sec Loss 9.3766 LearningRate 0.0772 Epoch: 4 Global Step: 27560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:13,191-Speed 11338.86 samples/sec Loss 9.4070 LearningRate 0.0772 Epoch: 4 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:14,121-Speed 11027.67 samples/sec Loss 9.4949 LearningRate 0.0772 Epoch: 4 Global Step: 27580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:15,044-Speed 11099.93 samples/sec Loss 9.3109 LearningRate 0.0772 Epoch: 4 Global Step: 27590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:15,963-Speed 11147.41 samples/sec Loss 9.4344 LearningRate 0.0772 Epoch: 4 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:16,881-Speed 11167.02 samples/sec Loss 9.4403 LearningRate 0.0772 Epoch: 4 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:17,776-Speed 11436.98 samples/sec Loss 9.4198 LearningRate 0.0772 Epoch: 4 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:18,706-Speed 11027.05 samples/sec Loss 9.4154 LearningRate 0.0772 Epoch: 4 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:19,604-Speed 11405.00 samples/sec Loss 9.5197 LearningRate 0.0772 Epoch: 4 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:20,495-Speed 11501.92 samples/sec Loss 9.3331 LearningRate 0.0772 Epoch: 4 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:21,383-Speed 11544.18 samples/sec Loss 9.4600 LearningRate 0.0772 Epoch: 4 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:22,269-Speed 11565.14 samples/sec Loss 9.2861 LearningRate 0.0771 Epoch: 4 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:23,180-Speed 11246.74 samples/sec Loss 9.5480 LearningRate 0.0771 Epoch: 4 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:24,127-Speed 10824.40 samples/sec Loss 9.5687 LearningRate 0.0771 Epoch: 4 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:25,024-Speed 11426.48 samples/sec Loss 9.3761 LearningRate 0.0771 Epoch: 4 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:25,941-Speed 11172.06 samples/sec Loss 9.4212 LearningRate 0.0771 Epoch: 4 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:26,911-Speed 10558.83 samples/sec Loss 9.3618 LearningRate 0.0771 Epoch: 4 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:27,833-Speed 11110.64 samples/sec Loss 9.2522 LearningRate 0.0771 Epoch: 4 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:28,734-Speed 11381.09 samples/sec Loss 9.4059 LearningRate 0.0771 Epoch: 4 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:29,648-Speed 11203.01 samples/sec Loss 9.3363 LearningRate 0.0771 Epoch: 4 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:30,557-Speed 11275.35 samples/sec Loss 9.3173 LearningRate 0.0771 Epoch: 4 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:31,480-Speed 11105.08 samples/sec Loss 9.2842 LearningRate 0.0771 Epoch: 4 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:32,434-Speed 10738.89 samples/sec Loss 9.2363 LearningRate 0.0771 Epoch: 4 Global Step: 27780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:33,340-Speed 11303.57 samples/sec Loss 9.3752 LearningRate 0.0771 Epoch: 4 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:34,232-Speed 11492.95 samples/sec Loss 9.4276 LearningRate 0.0770 Epoch: 4 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:35,123-Speed 11499.97 samples/sec Loss 9.2729 LearningRate 0.0770 Epoch: 4 Global Step: 27810 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:13:36,066-Speed 10868.29 samples/sec Loss 9.1453 LearningRate 0.0770 Epoch: 4 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:36,980-Speed 11206.55 samples/sec Loss 9.3361 LearningRate 0.0770 Epoch: 4 Global Step: 27830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:37,877-Speed 11431.82 samples/sec Loss 9.3507 LearningRate 0.0770 Epoch: 4 Global Step: 27840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:38,769-Speed 11489.00 samples/sec Loss 9.4208 LearningRate 0.0770 Epoch: 4 Global Step: 27850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:39,701-Speed 10998.69 samples/sec Loss 9.3748 LearningRate 0.0770 Epoch: 4 Global Step: 27860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:13:40,630-Speed 11033.13 samples/sec Loss 9.4813 LearningRate 0.0770 Epoch: 4 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:41,520-Speed 11504.66 samples/sec Loss 9.3112 LearningRate 0.0770 Epoch: 4 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:42,404-Speed 11601.09 samples/sec Loss 9.4244 LearningRate 0.0770 Epoch: 4 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:43,299-Speed 11441.90 samples/sec Loss 9.3593 LearningRate 0.0770 Epoch: 4 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:44,211-Speed 11238.34 samples/sec Loss 9.3832 LearningRate 0.0770 Epoch: 4 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:45,108-Speed 11420.12 samples/sec Loss 9.1228 LearningRate 0.0770 Epoch: 4 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:46,000-Speed 11487.66 samples/sec Loss 9.2774 LearningRate 0.0769 Epoch: 4 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:13:46,904-Speed 11340.70 samples/sec Loss 9.4344 LearningRate 0.0769 Epoch: 4 Global Step: 27940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:13:47,814-Speed 11264.51 samples/sec Loss 9.3654 LearningRate 0.0769 Epoch: 4 Global Step: 27950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:13:48,715-Speed 11365.87 samples/sec Loss 9.3492 LearningRate 0.0769 Epoch: 4 Global Step: 27960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:13:49,659-Speed 10851.78 samples/sec Loss 9.2897 LearningRate 0.0769 Epoch: 4 Global Step: 27970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:13:50,549-Speed 11517.75 samples/sec Loss 9.3127 LearningRate 0.0769 Epoch: 4 Global Step: 27980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:13:51,510-Speed 10668.17 samples/sec Loss 9.4976 LearningRate 0.0769 Epoch: 4 Global Step: 27990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:13:52,442-Speed 10996.43 samples/sec Loss 9.4193 LearningRate 0.0769 Epoch: 4 Global Step: 28000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:14:13,795-[lfw][28000]XNorm: 14.516859 Training: 2022-04-11 00:14:13,795-[lfw][28000]Accuracy-Flip: 0.99233+-0.00533 Training: 2022-04-11 00:14:13,796-[lfw][28000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:14:38,472-[cfp_fp][28000]XNorm: 12.191281 Training: 2022-04-11 00:14:38,472-[cfp_fp][28000]Accuracy-Flip: 0.92000+-0.01591 Training: 2022-04-11 00:14:38,473-[cfp_fp][28000]Accuracy-Highest: 0.92000 Training: 2022-04-11 00:14:59,688-[agedb_30][28000]XNorm: 14.210537 Training: 2022-04-11 00:14:59,689-[agedb_30][28000]Accuracy-Flip: 0.95133+-0.00576 Training: 2022-04-11 00:14:59,689-[agedb_30][28000]Accuracy-Highest: 0.95133 Training: 2022-04-11 00:15:00,598-Speed 150.24 samples/sec Loss 9.4527 LearningRate 0.0769 Epoch: 4 Global Step: 28010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:15:01,536-Speed 10931.67 samples/sec Loss 9.2737 LearningRate 0.0769 Epoch: 4 Global Step: 28020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:15:02,444-Speed 11283.87 samples/sec Loss 9.3154 LearningRate 0.0769 Epoch: 4 Global Step: 28030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 00:15:03,369-Speed 11079.29 samples/sec Loss 9.4204 LearningRate 0.0769 Epoch: 4 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:04,242-Speed 11740.05 samples/sec Loss 9.3497 LearningRate 0.0769 Epoch: 4 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:05,169-Speed 11046.08 samples/sec Loss 9.3223 LearningRate 0.0768 Epoch: 4 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:06,050-Speed 11629.75 samples/sec Loss 9.3376 LearningRate 0.0768 Epoch: 4 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:06,969-Speed 11155.99 samples/sec Loss 9.4206 LearningRate 0.0768 Epoch: 4 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:07,865-Speed 11436.54 samples/sec Loss 9.3140 LearningRate 0.0768 Epoch: 4 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:08,786-Speed 11122.28 samples/sec Loss 9.4158 LearningRate 0.0768 Epoch: 4 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:09,691-Speed 11329.59 samples/sec Loss 9.4468 LearningRate 0.0768 Epoch: 4 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:10,588-Speed 11426.43 samples/sec Loss 9.4068 LearningRate 0.0768 Epoch: 4 Global Step: 28120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:11,474-Speed 11557.13 samples/sec Loss 9.4391 LearningRate 0.0768 Epoch: 4 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:12,401-Speed 11053.28 samples/sec Loss 9.4644 LearningRate 0.0768 Epoch: 4 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:13,325-Speed 11096.30 samples/sec Loss 9.2540 LearningRate 0.0768 Epoch: 4 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:14,273-Speed 10799.33 samples/sec Loss 9.1520 LearningRate 0.0768 Epoch: 4 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:15,173-Speed 11388.48 samples/sec Loss 9.1705 LearningRate 0.0768 Epoch: 4 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:16,096-Speed 11104.71 samples/sec Loss 9.4056 LearningRate 0.0768 Epoch: 4 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:17,008-Speed 11234.81 samples/sec Loss 9.2180 LearningRate 0.0767 Epoch: 4 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:17,907-Speed 11396.01 samples/sec Loss 9.2838 LearningRate 0.0767 Epoch: 4 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:18,810-Speed 11359.10 samples/sec Loss 9.4345 LearningRate 0.0767 Epoch: 4 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:19,751-Speed 10889.26 samples/sec Loss 9.2467 LearningRate 0.0767 Epoch: 4 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:20,648-Speed 11418.99 samples/sec Loss 9.4220 LearningRate 0.0767 Epoch: 4 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:21,531-Speed 11604.24 samples/sec Loss 9.4394 LearningRate 0.0767 Epoch: 4 Global Step: 28240 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:15:22,444-Speed 11227.40 samples/sec Loss 9.3559 LearningRate 0.0767 Epoch: 4 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:23,356-Speed 11239.40 samples/sec Loss 9.3139 LearningRate 0.0767 Epoch: 4 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:24,291-Speed 10949.26 samples/sec Loss 9.1787 LearningRate 0.0767 Epoch: 4 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:25,247-Speed 10728.86 samples/sec Loss 9.2865 LearningRate 0.0767 Epoch: 4 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:26,196-Speed 10793.56 samples/sec Loss 9.2670 LearningRate 0.0767 Epoch: 4 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:27,087-Speed 11495.95 samples/sec Loss 9.2517 LearningRate 0.0767 Epoch: 4 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:27,995-Speed 11284.89 samples/sec Loss 9.1917 LearningRate 0.0767 Epoch: 4 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:28,963-Speed 10595.76 samples/sec Loss 9.3871 LearningRate 0.0766 Epoch: 4 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:29,876-Speed 11217.38 samples/sec Loss 9.4205 LearningRate 0.0766 Epoch: 4 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:30,829-Speed 10754.60 samples/sec Loss 9.2632 LearningRate 0.0766 Epoch: 4 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:31,741-Speed 11237.17 samples/sec Loss 9.3427 LearningRate 0.0766 Epoch: 4 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:32,664-Speed 11106.39 samples/sec Loss 9.3356 LearningRate 0.0766 Epoch: 4 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:33,560-Speed 11434.38 samples/sec Loss 9.3924 LearningRate 0.0766 Epoch: 4 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:34,440-Speed 11653.12 samples/sec Loss 9.2962 LearningRate 0.0766 Epoch: 4 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:35,313-Speed 11741.21 samples/sec Loss 9.3624 LearningRate 0.0766 Epoch: 4 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:36,219-Speed 11315.44 samples/sec Loss 9.2831 LearningRate 0.0766 Epoch: 4 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:37,127-Speed 11277.98 samples/sec Loss 9.2293 LearningRate 0.0766 Epoch: 4 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:38,112-Speed 10407.72 samples/sec Loss 9.3663 LearningRate 0.0766 Epoch: 4 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:39,005-Speed 11468.26 samples/sec Loss 9.3481 LearningRate 0.0766 Epoch: 4 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:49,434-Speed 981.97 samples/sec Loss 8.6594 LearningRate 0.0766 Epoch: 5 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:50,361-Speed 11055.44 samples/sec Loss 8.5981 LearningRate 0.0765 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:51,362-Speed 10245.69 samples/sec Loss 8.7108 LearningRate 0.0765 Epoch: 5 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:52,366-Speed 10203.92 samples/sec Loss 8.5711 LearningRate 0.0765 Epoch: 5 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:53,359-Speed 10311.02 samples/sec Loss 8.6892 LearningRate 0.0765 Epoch: 5 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:54,364-Speed 10202.85 samples/sec Loss 8.5183 LearningRate 0.0765 Epoch: 5 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:55,315-Speed 10765.83 samples/sec Loss 8.6050 LearningRate 0.0765 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:56,236-Speed 11134.91 samples/sec Loss 8.6655 LearningRate 0.0765 Epoch: 5 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:15:57,127-Speed 11506.17 samples/sec Loss 8.5948 LearningRate 0.0765 Epoch: 5 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:58,031-Speed 11338.02 samples/sec Loss 8.6713 LearningRate 0.0765 Epoch: 5 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:58,960-Speed 11026.19 samples/sec Loss 8.7349 LearningRate 0.0765 Epoch: 5 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:15:59,870-Speed 11256.82 samples/sec Loss 8.6988 LearningRate 0.0765 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:00,775-Speed 11340.63 samples/sec Loss 8.6988 LearningRate 0.0765 Epoch: 5 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:01,683-Speed 11284.99 samples/sec Loss 8.7350 LearningRate 0.0765 Epoch: 5 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:02,565-Speed 11616.63 samples/sec Loss 8.6011 LearningRate 0.0764 Epoch: 5 Global Step: 28580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:03,459-Speed 11457.69 samples/sec Loss 8.6684 LearningRate 0.0764 Epoch: 5 Global Step: 28590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:04,341-Speed 11620.14 samples/sec Loss 8.6943 LearningRate 0.0764 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:05,292-Speed 10778.16 samples/sec Loss 8.6837 LearningRate 0.0764 Epoch: 5 Global Step: 28610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:06,203-Speed 11258.33 samples/sec Loss 8.5399 LearningRate 0.0764 Epoch: 5 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:07,168-Speed 10609.06 samples/sec Loss 8.8533 LearningRate 0.0764 Epoch: 5 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:08,068-Speed 11395.25 samples/sec Loss 8.7882 LearningRate 0.0764 Epoch: 5 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:08,948-Speed 11634.00 samples/sec Loss 8.7708 LearningRate 0.0764 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:09,830-Speed 11629.35 samples/sec Loss 8.8238 LearningRate 0.0764 Epoch: 5 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:10,707-Speed 11677.62 samples/sec Loss 8.7556 LearningRate 0.0764 Epoch: 5 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:11,622-Speed 11196.75 samples/sec Loss 8.7285 LearningRate 0.0764 Epoch: 5 Global Step: 28680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:12,532-Speed 11260.25 samples/sec Loss 8.7843 LearningRate 0.0764 Epoch: 5 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:13,434-Speed 11368.12 samples/sec Loss 8.9146 LearningRate 0.0764 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:14,351-Speed 11167.37 samples/sec Loss 8.8308 LearningRate 0.0763 Epoch: 5 Global Step: 28710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:15,261-Speed 11266.99 samples/sec Loss 8.8604 LearningRate 0.0763 Epoch: 5 Global Step: 28720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:16,175-Speed 11206.51 samples/sec Loss 8.8183 LearningRate 0.0763 Epoch: 5 Global Step: 28730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:17,134-Speed 10684.21 samples/sec Loss 8.7265 LearningRate 0.0763 Epoch: 5 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:18,023-Speed 11536.33 samples/sec Loss 8.8350 LearningRate 0.0763 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:18,953-Speed 11017.61 samples/sec Loss 8.7407 LearningRate 0.0763 Epoch: 5 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:19,863-Speed 11253.35 samples/sec Loss 8.8160 LearningRate 0.0763 Epoch: 5 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:20,784-Speed 11127.13 samples/sec Loss 8.7315 LearningRate 0.0763 Epoch: 5 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:21,674-Speed 11512.87 samples/sec Loss 8.8371 LearningRate 0.0763 Epoch: 5 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:22,568-Speed 11467.80 samples/sec Loss 8.8033 LearningRate 0.0763 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:23,488-Speed 11137.56 samples/sec Loss 8.9211 LearningRate 0.0763 Epoch: 5 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:24,387-Speed 11399.39 samples/sec Loss 8.9962 LearningRate 0.0763 Epoch: 5 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:25,331-Speed 10864.14 samples/sec Loss 9.0439 LearningRate 0.0763 Epoch: 5 Global Step: 28830 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:26,217-Speed 11561.44 samples/sec Loss 8.7919 LearningRate 0.0762 Epoch: 5 Global Step: 28840 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:27,112-Speed 11453.10 samples/sec Loss 9.0422 LearningRate 0.0762 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:28,038-Speed 11057.06 samples/sec Loss 8.9647 LearningRate 0.0762 Epoch: 5 Global Step: 28860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:28,933-Speed 11459.61 samples/sec Loss 8.8973 LearningRate 0.0762 Epoch: 5 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:29,870-Speed 10929.84 samples/sec Loss 8.8402 LearningRate 0.0762 Epoch: 5 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:30,790-Speed 11143.28 samples/sec Loss 8.8787 LearningRate 0.0762 Epoch: 5 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:31,703-Speed 11219.26 samples/sec Loss 8.8146 LearningRate 0.0762 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:32,605-Speed 11367.00 samples/sec Loss 8.8750 LearningRate 0.0762 Epoch: 5 Global Step: 28910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:33,554-Speed 10792.42 samples/sec Loss 8.8728 LearningRate 0.0762 Epoch: 5 Global Step: 28920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:34,458-Speed 11333.98 samples/sec Loss 8.9679 LearningRate 0.0762 Epoch: 5 Global Step: 28930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:35,357-Speed 11398.21 samples/sec Loss 9.0654 LearningRate 0.0762 Epoch: 5 Global Step: 28940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:36,286-Speed 11040.20 samples/sec Loss 9.0874 LearningRate 0.0762 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:37,180-Speed 11460.94 samples/sec Loss 8.8366 LearningRate 0.0762 Epoch: 5 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:38,154-Speed 10517.71 samples/sec Loss 8.9372 LearningRate 0.0761 Epoch: 5 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:39,105-Speed 10776.57 samples/sec Loss 8.9884 LearningRate 0.0761 Epoch: 5 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:40,050-Speed 10838.27 samples/sec Loss 8.9752 LearningRate 0.0761 Epoch: 5 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:41,087-Speed 9883.53 samples/sec Loss 8.9833 LearningRate 0.0761 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:42,044-Speed 10718.78 samples/sec Loss 8.9986 LearningRate 0.0761 Epoch: 5 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:42,938-Speed 11453.43 samples/sec Loss 8.8422 LearningRate 0.0761 Epoch: 5 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:43,893-Speed 10737.22 samples/sec Loss 9.0434 LearningRate 0.0761 Epoch: 5 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:45,601-Speed 5995.79 samples/sec Loss 8.9180 LearningRate 0.0761 Epoch: 5 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:46,485-Speed 11596.57 samples/sec Loss 9.1148 LearningRate 0.0761 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:48,315-Speed 5598.34 samples/sec Loss 8.7875 LearningRate 0.0761 Epoch: 5 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:49,495-Speed 8677.94 samples/sec Loss 9.0390 LearningRate 0.0761 Epoch: 5 Global Step: 29070 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:50,405-Speed 11267.09 samples/sec Loss 8.9594 LearningRate 0.0761 Epoch: 5 Global Step: 29080 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:51,310-Speed 11321.09 samples/sec Loss 8.8484 LearningRate 0.0761 Epoch: 5 Global Step: 29090 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:16:52,222-Speed 11234.26 samples/sec Loss 8.8527 LearningRate 0.0760 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:16:53,159-Speed 10933.47 samples/sec Loss 9.0906 LearningRate 0.0760 Epoch: 5 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:54,097-Speed 10929.11 samples/sec Loss 9.0095 LearningRate 0.0760 Epoch: 5 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:55,016-Speed 11150.55 samples/sec Loss 9.1110 LearningRate 0.0760 Epoch: 5 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:55,935-Speed 11139.99 samples/sec Loss 8.9378 LearningRate 0.0760 Epoch: 5 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:56,813-Speed 11671.69 samples/sec Loss 9.0109 LearningRate 0.0760 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:57,722-Speed 11277.61 samples/sec Loss 8.9270 LearningRate 0.0760 Epoch: 5 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:58,617-Speed 11450.34 samples/sec Loss 8.9733 LearningRate 0.0760 Epoch: 5 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:16:59,541-Speed 11091.87 samples/sec Loss 8.9667 LearningRate 0.0760 Epoch: 5 Global Step: 29180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:00,516-Speed 10502.86 samples/sec Loss 9.0459 LearningRate 0.0760 Epoch: 5 Global Step: 29190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:01,461-Speed 10844.77 samples/sec Loss 8.9155 LearningRate 0.0760 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:02,380-Speed 11159.01 samples/sec Loss 9.0878 LearningRate 0.0760 Epoch: 5 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:03,271-Speed 11493.22 samples/sec Loss 9.0444 LearningRate 0.0760 Epoch: 5 Global Step: 29220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:04,186-Speed 11207.04 samples/sec Loss 8.9158 LearningRate 0.0759 Epoch: 5 Global Step: 29230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:05,100-Speed 11215.37 samples/sec Loss 8.9919 LearningRate 0.0759 Epoch: 5 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:06,041-Speed 10882.70 samples/sec Loss 8.9423 LearningRate 0.0759 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:06,914-Speed 11745.61 samples/sec Loss 8.9881 LearningRate 0.0759 Epoch: 5 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:07,827-Speed 11214.78 samples/sec Loss 8.9667 LearningRate 0.0759 Epoch: 5 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:08,727-Speed 11390.20 samples/sec Loss 9.1709 LearningRate 0.0759 Epoch: 5 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:09,618-Speed 11504.67 samples/sec Loss 8.9728 LearningRate 0.0759 Epoch: 5 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:10,506-Speed 11530.99 samples/sec Loss 9.0396 LearningRate 0.0759 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:11,407-Speed 11384.26 samples/sec Loss 8.8570 LearningRate 0.0759 Epoch: 5 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:12,286-Speed 11650.39 samples/sec Loss 9.1055 LearningRate 0.0759 Epoch: 5 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:13,188-Speed 11362.43 samples/sec Loss 8.9723 LearningRate 0.0759 Epoch: 5 Global Step: 29330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:14,105-Speed 11179.10 samples/sec Loss 9.0166 LearningRate 0.0759 Epoch: 5 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:15,022-Speed 11168.11 samples/sec Loss 8.9743 LearningRate 0.0759 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:15,985-Speed 10637.66 samples/sec Loss 9.0982 LearningRate 0.0758 Epoch: 5 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:16,914-Speed 11037.32 samples/sec Loss 8.9945 LearningRate 0.0758 Epoch: 5 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:17,839-Speed 11078.07 samples/sec Loss 9.0917 LearningRate 0.0758 Epoch: 5 Global Step: 29380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:18,729-Speed 11512.74 samples/sec Loss 9.1808 LearningRate 0.0758 Epoch: 5 Global Step: 29390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:19,672-Speed 10860.29 samples/sec Loss 9.0936 LearningRate 0.0758 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:20,605-Speed 10984.23 samples/sec Loss 8.8614 LearningRate 0.0758 Epoch: 5 Global Step: 29410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:21,525-Speed 11142.50 samples/sec Loss 8.9547 LearningRate 0.0758 Epoch: 5 Global Step: 29420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:22,418-Speed 11466.57 samples/sec Loss 9.0285 LearningRate 0.0758 Epoch: 5 Global Step: 29430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:23,320-Speed 11367.57 samples/sec Loss 8.9689 LearningRate 0.0758 Epoch: 5 Global Step: 29440 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:17:24,229-Speed 11274.94 samples/sec Loss 9.1334 LearningRate 0.0758 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:17:25,097-Speed 11803.04 samples/sec Loss 8.8899 LearningRate 0.0758 Epoch: 5 Global Step: 29460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:25,989-Speed 11486.65 samples/sec Loss 9.0555 LearningRate 0.0758 Epoch: 5 Global Step: 29470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:26,899-Speed 11271.41 samples/sec Loss 9.0958 LearningRate 0.0758 Epoch: 5 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:27,843-Speed 10853.17 samples/sec Loss 9.0629 LearningRate 0.0757 Epoch: 5 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:28,752-Speed 11265.44 samples/sec Loss 9.0955 LearningRate 0.0757 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:29,658-Speed 11306.50 samples/sec Loss 9.0267 LearningRate 0.0757 Epoch: 5 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:30,622-Speed 10632.53 samples/sec Loss 9.0569 LearningRate 0.0757 Epoch: 5 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:31,543-Speed 11123.48 samples/sec Loss 8.9994 LearningRate 0.0757 Epoch: 5 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:32,478-Speed 10964.71 samples/sec Loss 9.1613 LearningRate 0.0757 Epoch: 5 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:33,368-Speed 11511.27 samples/sec Loss 9.0062 LearningRate 0.0757 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:34,268-Speed 11385.39 samples/sec Loss 9.1358 LearningRate 0.0757 Epoch: 5 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:35,146-Speed 11670.03 samples/sec Loss 9.1655 LearningRate 0.0757 Epoch: 5 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:36,047-Speed 11373.93 samples/sec Loss 9.2119 LearningRate 0.0757 Epoch: 5 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:36,972-Speed 11083.98 samples/sec Loss 8.9040 LearningRate 0.0757 Epoch: 5 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:37,884-Speed 11234.36 samples/sec Loss 9.1251 LearningRate 0.0757 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:38,768-Speed 11590.74 samples/sec Loss 9.3290 LearningRate 0.0757 Epoch: 5 Global Step: 29610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:39,686-Speed 11156.79 samples/sec Loss 9.0025 LearningRate 0.0756 Epoch: 5 Global Step: 29620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:40,562-Speed 11697.77 samples/sec Loss 8.9402 LearningRate 0.0756 Epoch: 5 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:41,476-Speed 11217.12 samples/sec Loss 9.1397 LearningRate 0.0756 Epoch: 5 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:42,390-Speed 11212.00 samples/sec Loss 9.1688 LearningRate 0.0756 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:43,295-Speed 11319.57 samples/sec Loss 9.1970 LearningRate 0.0756 Epoch: 5 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:44,185-Speed 11512.06 samples/sec Loss 9.0718 LearningRate 0.0756 Epoch: 5 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:45,057-Speed 11754.26 samples/sec Loss 9.0982 LearningRate 0.0756 Epoch: 5 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:45,938-Speed 11634.62 samples/sec Loss 9.1587 LearningRate 0.0756 Epoch: 5 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:46,843-Speed 11320.09 samples/sec Loss 9.0326 LearningRate 0.0756 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:47,767-Speed 11090.98 samples/sec Loss 9.1294 LearningRate 0.0756 Epoch: 5 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:48,685-Speed 11158.82 samples/sec Loss 9.0082 LearningRate 0.0756 Epoch: 5 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:49,559-Speed 11729.26 samples/sec Loss 9.1297 LearningRate 0.0756 Epoch: 5 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:50,540-Speed 10450.80 samples/sec Loss 9.1161 LearningRate 0.0756 Epoch: 5 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:51,527-Speed 10380.47 samples/sec Loss 9.0471 LearningRate 0.0756 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:52,452-Speed 11072.65 samples/sec Loss 8.9610 LearningRate 0.0755 Epoch: 5 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:53,364-Speed 11235.04 samples/sec Loss 9.0694 LearningRate 0.0755 Epoch: 5 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:54,239-Speed 11715.83 samples/sec Loss 9.3125 LearningRate 0.0755 Epoch: 5 Global Step: 29780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:55,141-Speed 11356.46 samples/sec Loss 9.1258 LearningRate 0.0755 Epoch: 5 Global Step: 29790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:56,036-Speed 11453.61 samples/sec Loss 9.1270 LearningRate 0.0755 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:56,937-Speed 11366.62 samples/sec Loss 9.0384 LearningRate 0.0755 Epoch: 5 Global Step: 29810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:57,818-Speed 11643.09 samples/sec Loss 9.0403 LearningRate 0.0755 Epoch: 5 Global Step: 29820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:17:58,690-Speed 11749.59 samples/sec Loss 9.1362 LearningRate 0.0755 Epoch: 5 Global Step: 29830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:17:59,579-Speed 11515.95 samples/sec Loss 9.1007 LearningRate 0.0755 Epoch: 5 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:00,479-Speed 11385.90 samples/sec Loss 9.0783 LearningRate 0.0755 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:01,371-Speed 11496.22 samples/sec Loss 9.1477 LearningRate 0.0755 Epoch: 5 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:02,305-Speed 10974.89 samples/sec Loss 9.0234 LearningRate 0.0755 Epoch: 5 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:03,220-Speed 11197.21 samples/sec Loss 9.1330 LearningRate 0.0755 Epoch: 5 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:04,097-Speed 11687.15 samples/sec Loss 9.1374 LearningRate 0.0754 Epoch: 5 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:05,011-Speed 11204.06 samples/sec Loss 8.9841 LearningRate 0.0754 Epoch: 5 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:05,887-Speed 11700.76 samples/sec Loss 9.2630 LearningRate 0.0754 Epoch: 5 Global Step: 29910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:06,787-Speed 11392.53 samples/sec Loss 9.1315 LearningRate 0.0754 Epoch: 5 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:07,688-Speed 11375.32 samples/sec Loss 9.0037 LearningRate 0.0754 Epoch: 5 Global Step: 29930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:08,585-Speed 11421.86 samples/sec Loss 9.0373 LearningRate 0.0754 Epoch: 5 Global Step: 29940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:09,486-Speed 11375.85 samples/sec Loss 8.9573 LearningRate 0.0754 Epoch: 5 Global Step: 29950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:10,394-Speed 11281.61 samples/sec Loss 9.1868 LearningRate 0.0754 Epoch: 5 Global Step: 29960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:11,297-Speed 11348.91 samples/sec Loss 9.0429 LearningRate 0.0754 Epoch: 5 Global Step: 29970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:18:12,191-Speed 11464.67 samples/sec Loss 9.0460 LearningRate 0.0754 Epoch: 5 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:18:13,121-Speed 11009.52 samples/sec Loss 8.9682 LearningRate 0.0754 Epoch: 5 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:18:14,012-Speed 11509.71 samples/sec Loss 9.1656 LearningRate 0.0754 Epoch: 5 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:18:35,283-[lfw][30000]XNorm: 14.224702 Training: 2022-04-11 00:18:35,283-[lfw][30000]Accuracy-Flip: 0.99350+-0.00444 Training: 2022-04-11 00:18:35,284-[lfw][30000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:18:59,851-[cfp_fp][30000]XNorm: 11.929152 Training: 2022-04-11 00:18:59,851-[cfp_fp][30000]Accuracy-Flip: 0.92271+-0.01506 Training: 2022-04-11 00:18:59,851-[cfp_fp][30000]Accuracy-Highest: 0.92271 Training: 2022-04-11 00:19:21,102-[agedb_30][30000]XNorm: 13.941234 Training: 2022-04-11 00:19:21,102-[agedb_30][30000]Accuracy-Flip: 0.95233+-0.01106 Training: 2022-04-11 00:19:21,103-[agedb_30][30000]Accuracy-Highest: 0.95233 Training: 2022-04-11 00:19:22,006-Speed 150.60 samples/sec Loss 9.0043 LearningRate 0.0754 Epoch: 5 Global Step: 30010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:22,894-Speed 11535.22 samples/sec Loss 9.1956 LearningRate 0.0753 Epoch: 5 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:23,772-Speed 11667.73 samples/sec Loss 9.0548 LearningRate 0.0753 Epoch: 5 Global Step: 30030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:24,692-Speed 11140.10 samples/sec Loss 9.0215 LearningRate 0.0753 Epoch: 5 Global Step: 30040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:25,580-Speed 11544.70 samples/sec Loss 9.1389 LearningRate 0.0753 Epoch: 5 Global Step: 30050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:26,479-Speed 11400.22 samples/sec Loss 9.1909 LearningRate 0.0753 Epoch: 5 Global Step: 30060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:27,413-Speed 10959.48 samples/sec Loss 9.2124 LearningRate 0.0753 Epoch: 5 Global Step: 30070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:28,347-Speed 10977.63 samples/sec Loss 9.0678 LearningRate 0.0753 Epoch: 5 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:29,271-Speed 11085.31 samples/sec Loss 9.3007 LearningRate 0.0753 Epoch: 5 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:30,224-Speed 10761.80 samples/sec Loss 9.0902 LearningRate 0.0753 Epoch: 5 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:31,161-Speed 10932.27 samples/sec Loss 9.1960 LearningRate 0.0753 Epoch: 5 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:32,084-Speed 11100.76 samples/sec Loss 9.1593 LearningRate 0.0753 Epoch: 5 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:32,972-Speed 11536.31 samples/sec Loss 8.9674 LearningRate 0.0753 Epoch: 5 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:33,847-Speed 11710.86 samples/sec Loss 9.1076 LearningRate 0.0753 Epoch: 5 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:34,752-Speed 11330.77 samples/sec Loss 8.9414 LearningRate 0.0752 Epoch: 5 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:35,670-Speed 11162.07 samples/sec Loss 9.1403 LearningRate 0.0752 Epoch: 5 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:36,583-Speed 11220.31 samples/sec Loss 9.1094 LearningRate 0.0752 Epoch: 5 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:37,482-Speed 11401.00 samples/sec Loss 9.2400 LearningRate 0.0752 Epoch: 5 Global Step: 30180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:38,405-Speed 11098.89 samples/sec Loss 9.1763 LearningRate 0.0752 Epoch: 5 Global Step: 30190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:39,305-Speed 11390.12 samples/sec Loss 9.1351 LearningRate 0.0752 Epoch: 5 Global Step: 30200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:40,213-Speed 11285.21 samples/sec Loss 8.9814 LearningRate 0.0752 Epoch: 5 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:41,110-Speed 11426.89 samples/sec Loss 9.1986 LearningRate 0.0752 Epoch: 5 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:42,030-Speed 11137.98 samples/sec Loss 9.0522 LearningRate 0.0752 Epoch: 5 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:42,931-Speed 11375.55 samples/sec Loss 9.0234 LearningRate 0.0752 Epoch: 5 Global Step: 30240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:43,830-Speed 11389.90 samples/sec Loss 9.1275 LearningRate 0.0752 Epoch: 5 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:44,721-Speed 11503.39 samples/sec Loss 9.1716 LearningRate 0.0752 Epoch: 5 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:45,598-Speed 11681.70 samples/sec Loss 9.2824 LearningRate 0.0752 Epoch: 5 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:46,468-Speed 11784.81 samples/sec Loss 9.0995 LearningRate 0.0751 Epoch: 5 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:47,381-Speed 11222.90 samples/sec Loss 9.1192 LearningRate 0.0751 Epoch: 5 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:48,281-Speed 11379.92 samples/sec Loss 8.9442 LearningRate 0.0751 Epoch: 5 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:49,204-Speed 11102.77 samples/sec Loss 9.2001 LearningRate 0.0751 Epoch: 5 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:50,097-Speed 11478.31 samples/sec Loss 9.2953 LearningRate 0.0751 Epoch: 5 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:50,980-Speed 11597.94 samples/sec Loss 9.0674 LearningRate 0.0751 Epoch: 5 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:51,924-Speed 10855.22 samples/sec Loss 9.3384 LearningRate 0.0751 Epoch: 5 Global Step: 30340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:52,854-Speed 11022.96 samples/sec Loss 9.1148 LearningRate 0.0751 Epoch: 5 Global Step: 30350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:53,814-Speed 10677.03 samples/sec Loss 9.1463 LearningRate 0.0751 Epoch: 5 Global Step: 30360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:54,744-Speed 11013.81 samples/sec Loss 9.0966 LearningRate 0.0751 Epoch: 5 Global Step: 30370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:19:55,641-Speed 11422.67 samples/sec Loss 9.1723 LearningRate 0.0751 Epoch: 5 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:56,555-Speed 11219.34 samples/sec Loss 8.9924 LearningRate 0.0751 Epoch: 5 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:57,448-Speed 11475.84 samples/sec Loss 9.0619 LearningRate 0.0751 Epoch: 5 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:58,339-Speed 11496.76 samples/sec Loss 9.1172 LearningRate 0.0750 Epoch: 5 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:19:59,235-Speed 11436.85 samples/sec Loss 9.1113 LearningRate 0.0750 Epoch: 5 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:00,135-Speed 11377.66 samples/sec Loss 9.2699 LearningRate 0.0750 Epoch: 5 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:01,028-Speed 11483.15 samples/sec Loss 9.2188 LearningRate 0.0750 Epoch: 5 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:01,954-Speed 11062.58 samples/sec Loss 8.9493 LearningRate 0.0750 Epoch: 5 Global Step: 30450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:02,851-Speed 11426.39 samples/sec Loss 8.9599 LearningRate 0.0750 Epoch: 5 Global Step: 30460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:03,730-Speed 11656.85 samples/sec Loss 9.2723 LearningRate 0.0750 Epoch: 5 Global Step: 30470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:04,597-Speed 11821.59 samples/sec Loss 9.0646 LearningRate 0.0750 Epoch: 5 Global Step: 30480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:05,496-Speed 11407.01 samples/sec Loss 9.0111 LearningRate 0.0750 Epoch: 5 Global Step: 30490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:06,413-Speed 11175.79 samples/sec Loss 9.0920 LearningRate 0.0750 Epoch: 5 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:07,304-Speed 11490.19 samples/sec Loss 9.1466 LearningRate 0.0750 Epoch: 5 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:08,189-Speed 11588.09 samples/sec Loss 9.2047 LearningRate 0.0750 Epoch: 5 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:09,077-Speed 11540.17 samples/sec Loss 9.1215 LearningRate 0.0750 Epoch: 5 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:09,973-Speed 11430.67 samples/sec Loss 9.2282 LearningRate 0.0749 Epoch: 5 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:10,850-Speed 11681.39 samples/sec Loss 9.1456 LearningRate 0.0749 Epoch: 5 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:11,775-Speed 11079.67 samples/sec Loss 9.0539 LearningRate 0.0749 Epoch: 5 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:12,695-Speed 11142.35 samples/sec Loss 8.9464 LearningRate 0.0749 Epoch: 5 Global Step: 30570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:13,600-Speed 11314.91 samples/sec Loss 9.2860 LearningRate 0.0749 Epoch: 5 Global Step: 30580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:14,550-Speed 10792.33 samples/sec Loss 9.1336 LearningRate 0.0749 Epoch: 5 Global Step: 30590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:15,489-Speed 10916.54 samples/sec Loss 9.2007 LearningRate 0.0749 Epoch: 5 Global Step: 30600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:16,444-Speed 10728.02 samples/sec Loss 9.2222 LearningRate 0.0749 Epoch: 5 Global Step: 30610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:17,378-Speed 10983.09 samples/sec Loss 9.2070 LearningRate 0.0749 Epoch: 5 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:18,267-Speed 11524.33 samples/sec Loss 9.0236 LearningRate 0.0749 Epoch: 5 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:19,158-Speed 11493.51 samples/sec Loss 9.1804 LearningRate 0.0749 Epoch: 5 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:20,046-Speed 11540.69 samples/sec Loss 9.0806 LearningRate 0.0749 Epoch: 5 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:20,936-Speed 11513.28 samples/sec Loss 8.9326 LearningRate 0.0749 Epoch: 5 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:21,857-Speed 11129.31 samples/sec Loss 9.2985 LearningRate 0.0748 Epoch: 5 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:22,738-Speed 11628.48 samples/sec Loss 9.0942 LearningRate 0.0748 Epoch: 5 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:23,643-Speed 11326.01 samples/sec Loss 9.1081 LearningRate 0.0748 Epoch: 5 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:24,583-Speed 10907.41 samples/sec Loss 9.1210 LearningRate 0.0748 Epoch: 5 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:25,495-Speed 11233.22 samples/sec Loss 9.0984 LearningRate 0.0748 Epoch: 5 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:26,412-Speed 11165.56 samples/sec Loss 8.9786 LearningRate 0.0748 Epoch: 5 Global Step: 30720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:20:27,309-Speed 11433.74 samples/sec Loss 8.9775 LearningRate 0.0748 Epoch: 5 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:28,210-Speed 11360.97 samples/sec Loss 8.8718 LearningRate 0.0748 Epoch: 5 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:29,096-Speed 11569.12 samples/sec Loss 8.9744 LearningRate 0.0748 Epoch: 5 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:30,001-Speed 11324.88 samples/sec Loss 9.1203 LearningRate 0.0748 Epoch: 5 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:30,910-Speed 11274.82 samples/sec Loss 9.2941 LearningRate 0.0748 Epoch: 5 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:31,816-Speed 11315.26 samples/sec Loss 9.1088 LearningRate 0.0748 Epoch: 5 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:32,728-Speed 11236.67 samples/sec Loss 9.0120 LearningRate 0.0748 Epoch: 5 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:33,626-Speed 11410.48 samples/sec Loss 9.0997 LearningRate 0.0747 Epoch: 5 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:34,498-Speed 11743.64 samples/sec Loss 8.9467 LearningRate 0.0747 Epoch: 5 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:35,398-Speed 11397.04 samples/sec Loss 9.1761 LearningRate 0.0747 Epoch: 5 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:36,275-Speed 11672.40 samples/sec Loss 9.1035 LearningRate 0.0747 Epoch: 5 Global Step: 30830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:37,199-Speed 11090.93 samples/sec Loss 9.0726 LearningRate 0.0747 Epoch: 5 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:38,083-Speed 11601.28 samples/sec Loss 9.1830 LearningRate 0.0747 Epoch: 5 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:38,960-Speed 11683.44 samples/sec Loss 9.2840 LearningRate 0.0747 Epoch: 5 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:39,849-Speed 11524.12 samples/sec Loss 9.0695 LearningRate 0.0747 Epoch: 5 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:40,716-Speed 11821.59 samples/sec Loss 9.1502 LearningRate 0.0747 Epoch: 5 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:41,626-Speed 11258.75 samples/sec Loss 9.0006 LearningRate 0.0747 Epoch: 5 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:42,581-Speed 10730.87 samples/sec Loss 8.9786 LearningRate 0.0747 Epoch: 5 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:43,505-Speed 11091.92 samples/sec Loss 9.1239 LearningRate 0.0747 Epoch: 5 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:44,440-Speed 10961.02 samples/sec Loss 9.0849 LearningRate 0.0747 Epoch: 5 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:45,335-Speed 11443.56 samples/sec Loss 8.9571 LearningRate 0.0747 Epoch: 5 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:46,261-Speed 11069.33 samples/sec Loss 9.0102 LearningRate 0.0746 Epoch: 5 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:47,131-Speed 11786.03 samples/sec Loss 9.1096 LearningRate 0.0746 Epoch: 5 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:48,011-Speed 11649.17 samples/sec Loss 9.2714 LearningRate 0.0746 Epoch: 5 Global Step: 30960 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:20:48,921-Speed 11255.41 samples/sec Loss 9.1294 LearningRate 0.0746 Epoch: 5 Global Step: 30970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:49,826-Speed 11323.07 samples/sec Loss 9.0814 LearningRate 0.0746 Epoch: 5 Global Step: 30980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:50,707-Speed 11639.03 samples/sec Loss 9.1509 LearningRate 0.0746 Epoch: 5 Global Step: 30990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:51,591-Speed 11590.46 samples/sec Loss 9.0576 LearningRate 0.0746 Epoch: 5 Global Step: 31000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:52,479-Speed 11548.94 samples/sec Loss 8.8634 LearningRate 0.0746 Epoch: 5 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:53,369-Speed 11522.18 samples/sec Loss 8.9423 LearningRate 0.0746 Epoch: 5 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:54,272-Speed 11344.16 samples/sec Loss 9.1956 LearningRate 0.0746 Epoch: 5 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:55,150-Speed 11675.33 samples/sec Loss 9.1406 LearningRate 0.0746 Epoch: 5 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:56,088-Speed 10924.84 samples/sec Loss 9.0923 LearningRate 0.0746 Epoch: 5 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:20:56,969-Speed 11623.32 samples/sec Loss 8.8765 LearningRate 0.0746 Epoch: 5 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:57,875-Speed 11320.45 samples/sec Loss 9.2133 LearningRate 0.0745 Epoch: 5 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:58,819-Speed 10853.98 samples/sec Loss 9.1167 LearningRate 0.0745 Epoch: 5 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:20:59,742-Speed 11093.90 samples/sec Loss 8.9260 LearningRate 0.0745 Epoch: 5 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:00,691-Speed 10800.18 samples/sec Loss 9.0084 LearningRate 0.0745 Epoch: 5 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:01,613-Speed 11108.13 samples/sec Loss 8.9997 LearningRate 0.0745 Epoch: 5 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:02,563-Speed 10793.38 samples/sec Loss 9.1326 LearningRate 0.0745 Epoch: 5 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:03,469-Speed 11305.62 samples/sec Loss 9.0099 LearningRate 0.0745 Epoch: 5 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:04,380-Speed 11254.51 samples/sec Loss 8.9141 LearningRate 0.0745 Epoch: 5 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:05,303-Speed 11096.68 samples/sec Loss 9.1276 LearningRate 0.0745 Epoch: 5 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:06,259-Speed 10720.53 samples/sec Loss 9.1504 LearningRate 0.0745 Epoch: 5 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:07,180-Speed 11126.09 samples/sec Loss 9.1102 LearningRate 0.0745 Epoch: 5 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:08,048-Speed 11802.80 samples/sec Loss 9.1374 LearningRate 0.0745 Epoch: 5 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:08,951-Speed 11354.33 samples/sec Loss 9.0671 LearningRate 0.0745 Epoch: 5 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:09,827-Speed 11694.38 samples/sec Loss 9.0583 LearningRate 0.0744 Epoch: 5 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:10,705-Speed 11677.09 samples/sec Loss 9.1944 LearningRate 0.0744 Epoch: 5 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:11,593-Speed 11539.43 samples/sec Loss 9.2074 LearningRate 0.0744 Epoch: 5 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:12,479-Speed 11560.09 samples/sec Loss 9.2270 LearningRate 0.0744 Epoch: 5 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:13,363-Speed 11596.66 samples/sec Loss 9.1012 LearningRate 0.0744 Epoch: 5 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:14,244-Speed 11632.32 samples/sec Loss 9.0982 LearningRate 0.0744 Epoch: 5 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:15,136-Speed 11488.56 samples/sec Loss 9.1719 LearningRate 0.0744 Epoch: 5 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:16,018-Speed 11620.18 samples/sec Loss 9.0324 LearningRate 0.0744 Epoch: 5 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:16,907-Speed 11538.65 samples/sec Loss 9.2751 LearningRate 0.0744 Epoch: 5 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:17,797-Speed 11509.01 samples/sec Loss 9.2431 LearningRate 0.0744 Epoch: 5 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:18,707-Speed 11260.48 samples/sec Loss 9.1236 LearningRate 0.0744 Epoch: 5 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:19,629-Speed 11117.72 samples/sec Loss 9.1775 LearningRate 0.0744 Epoch: 5 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:20,514-Speed 11580.08 samples/sec Loss 9.0850 LearningRate 0.0744 Epoch: 5 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:21,405-Speed 11496.88 samples/sec Loss 9.0237 LearningRate 0.0743 Epoch: 5 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:22,330-Speed 11071.07 samples/sec Loss 9.1037 LearningRate 0.0743 Epoch: 5 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:23,207-Speed 11682.33 samples/sec Loss 9.0334 LearningRate 0.0743 Epoch: 5 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:24,101-Speed 11461.02 samples/sec Loss 8.8698 LearningRate 0.0743 Epoch: 5 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:24,979-Speed 11671.49 samples/sec Loss 8.9545 LearningRate 0.0743 Epoch: 5 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:25,874-Speed 11459.93 samples/sec Loss 9.0863 LearningRate 0.0743 Epoch: 5 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:26,811-Speed 10932.98 samples/sec Loss 9.1640 LearningRate 0.0743 Epoch: 5 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:27,750-Speed 10915.26 samples/sec Loss 8.9676 LearningRate 0.0743 Epoch: 5 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:28,684-Speed 10972.17 samples/sec Loss 8.9090 LearningRate 0.0743 Epoch: 5 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:29,637-Speed 10757.87 samples/sec Loss 8.9916 LearningRate 0.0743 Epoch: 5 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:30,538-Speed 11367.73 samples/sec Loss 9.1060 LearningRate 0.0743 Epoch: 5 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:31,444-Speed 11311.20 samples/sec Loss 9.0504 LearningRate 0.0743 Epoch: 5 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:32,372-Speed 11039.62 samples/sec Loss 9.0053 LearningRate 0.0743 Epoch: 5 Global Step: 31450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:33,263-Speed 11506.96 samples/sec Loss 9.0449 LearningRate 0.0742 Epoch: 5 Global Step: 31460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:34,161-Speed 11403.74 samples/sec Loss 9.0051 LearningRate 0.0742 Epoch: 5 Global Step: 31470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:35,025-Speed 11871.24 samples/sec Loss 9.1091 LearningRate 0.0742 Epoch: 5 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:35,895-Speed 11772.06 samples/sec Loss 9.0406 LearningRate 0.0742 Epoch: 5 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:36,762-Speed 11812.49 samples/sec Loss 9.1599 LearningRate 0.0742 Epoch: 5 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:37,662-Speed 11392.56 samples/sec Loss 8.9509 LearningRate 0.0742 Epoch: 5 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:38,558-Speed 11436.65 samples/sec Loss 9.0837 LearningRate 0.0742 Epoch: 5 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:39,523-Speed 10611.62 samples/sec Loss 8.9666 LearningRate 0.0742 Epoch: 5 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:40,460-Speed 10947.12 samples/sec Loss 9.2762 LearningRate 0.0742 Epoch: 5 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:41,409-Speed 10796.47 samples/sec Loss 9.0440 LearningRate 0.0742 Epoch: 5 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:42,337-Speed 11033.55 samples/sec Loss 9.0040 LearningRate 0.0742 Epoch: 5 Global Step: 31560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:43,210-Speed 11747.61 samples/sec Loss 9.0331 LearningRate 0.0742 Epoch: 5 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:44,105-Speed 11439.04 samples/sec Loss 9.0115 LearningRate 0.0742 Epoch: 5 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:44,983-Speed 11670.01 samples/sec Loss 9.0467 LearningRate 0.0742 Epoch: 5 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:45,855-Speed 11763.32 samples/sec Loss 9.0888 LearningRate 0.0741 Epoch: 5 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:46,773-Speed 11166.93 samples/sec Loss 8.9650 LearningRate 0.0741 Epoch: 5 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:47,684-Speed 11258.50 samples/sec Loss 9.0900 LearningRate 0.0741 Epoch: 5 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:48,608-Speed 11084.14 samples/sec Loss 9.1346 LearningRate 0.0741 Epoch: 5 Global Step: 31630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:49,540-Speed 10997.99 samples/sec Loss 8.9016 LearningRate 0.0741 Epoch: 5 Global Step: 31640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:50,467-Speed 11052.60 samples/sec Loss 9.0446 LearningRate 0.0741 Epoch: 5 Global Step: 31650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:51,362-Speed 11451.26 samples/sec Loss 9.0438 LearningRate 0.0741 Epoch: 5 Global Step: 31660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:52,256-Speed 11465.80 samples/sec Loss 9.0292 LearningRate 0.0741 Epoch: 5 Global Step: 31670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:53,177-Speed 11121.57 samples/sec Loss 9.1742 LearningRate 0.0741 Epoch: 5 Global Step: 31680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:54,086-Speed 11275.52 samples/sec Loss 9.0010 LearningRate 0.0741 Epoch: 5 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:54,965-Speed 11665.04 samples/sec Loss 8.9035 LearningRate 0.0741 Epoch: 5 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:55,865-Speed 11376.20 samples/sec Loss 9.2196 LearningRate 0.0741 Epoch: 5 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:56,750-Speed 11585.28 samples/sec Loss 9.1672 LearningRate 0.0741 Epoch: 5 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:57,616-Speed 11822.59 samples/sec Loss 9.1150 LearningRate 0.0740 Epoch: 5 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:21:58,542-Speed 11069.77 samples/sec Loss 9.0074 LearningRate 0.0740 Epoch: 5 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:21:59,448-Speed 11309.70 samples/sec Loss 9.1451 LearningRate 0.0740 Epoch: 5 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:00,348-Speed 11392.87 samples/sec Loss 9.0661 LearningRate 0.0740 Epoch: 5 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:01,274-Speed 11060.22 samples/sec Loss 9.0730 LearningRate 0.0740 Epoch: 5 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:02,174-Speed 11391.57 samples/sec Loss 9.0670 LearningRate 0.0740 Epoch: 5 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:03,080-Speed 11305.07 samples/sec Loss 9.0000 LearningRate 0.0740 Epoch: 5 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:03,979-Speed 11401.83 samples/sec Loss 9.0578 LearningRate 0.0740 Epoch: 5 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:04,893-Speed 11206.38 samples/sec Loss 9.0668 LearningRate 0.0740 Epoch: 5 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:05,816-Speed 11106.76 samples/sec Loss 9.0372 LearningRate 0.0740 Epoch: 5 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:06,715-Speed 11400.99 samples/sec Loss 9.2222 LearningRate 0.0740 Epoch: 5 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:07,605-Speed 11512.91 samples/sec Loss 8.9936 LearningRate 0.0740 Epoch: 5 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:22:08,531-Speed 11066.85 samples/sec Loss 9.0684 LearningRate 0.0740 Epoch: 5 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:22:09,418-Speed 11571.09 samples/sec Loss 8.8143 LearningRate 0.0739 Epoch: 5 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:22:10,301-Speed 11603.49 samples/sec Loss 9.0392 LearningRate 0.0739 Epoch: 5 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:11,191-Speed 11537.03 samples/sec Loss 9.0274 LearningRate 0.0739 Epoch: 5 Global Step: 31880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:12,054-Speed 11878.83 samples/sec Loss 9.0859 LearningRate 0.0739 Epoch: 5 Global Step: 31890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:13,008-Speed 10730.74 samples/sec Loss 9.0871 LearningRate 0.0739 Epoch: 5 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:13,905-Speed 11423.36 samples/sec Loss 9.0887 LearningRate 0.0739 Epoch: 5 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:14,802-Speed 11435.26 samples/sec Loss 9.1608 LearningRate 0.0739 Epoch: 5 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:15,712-Speed 11251.41 samples/sec Loss 9.2233 LearningRate 0.0739 Epoch: 5 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:16,645-Speed 10992.85 samples/sec Loss 9.0708 LearningRate 0.0739 Epoch: 5 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:17,527-Speed 11609.08 samples/sec Loss 9.1102 LearningRate 0.0739 Epoch: 5 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:18,412-Speed 11578.82 samples/sec Loss 9.0669 LearningRate 0.0739 Epoch: 5 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:19,349-Speed 10942.06 samples/sec Loss 9.0163 LearningRate 0.0739 Epoch: 5 Global Step: 31970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:22:20,259-Speed 11251.23 samples/sec Loss 9.0693 LearningRate 0.0739 Epoch: 5 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:21,128-Speed 11795.13 samples/sec Loss 9.0928 LearningRate 0.0738 Epoch: 5 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:22,014-Speed 11563.37 samples/sec Loss 9.1974 LearningRate 0.0738 Epoch: 5 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:22:43,166-[lfw][32000]XNorm: 14.040582 Training: 2022-04-11 00:22:43,167-[lfw][32000]Accuracy-Flip: 0.99267+-0.00448 Training: 2022-04-11 00:22:43,167-[lfw][32000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:23:07,712-[cfp_fp][32000]XNorm: 11.822396 Training: 2022-04-11 00:23:07,712-[cfp_fp][32000]Accuracy-Flip: 0.92086+-0.01330 Training: 2022-04-11 00:23:07,712-[cfp_fp][32000]Accuracy-Highest: 0.92271 Training: 2022-04-11 00:23:28,896-[agedb_30][32000]XNorm: 13.702907 Training: 2022-04-11 00:23:28,896-[agedb_30][32000]Accuracy-Flip: 0.95183+-0.00953 Training: 2022-04-11 00:23:28,896-[agedb_30][32000]Accuracy-Highest: 0.95233 Training: 2022-04-11 00:23:29,829-Speed 151.00 samples/sec Loss 9.0341 LearningRate 0.0738 Epoch: 5 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:30,728-Speed 11399.56 samples/sec Loss 8.8801 LearningRate 0.0738 Epoch: 5 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:31,652-Speed 11088.01 samples/sec Loss 9.1682 LearningRate 0.0738 Epoch: 5 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:32,540-Speed 11538.95 samples/sec Loss 9.0382 LearningRate 0.0738 Epoch: 5 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:33,427-Speed 11561.06 samples/sec Loss 8.9113 LearningRate 0.0738 Epoch: 5 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:34,359-Speed 10992.10 samples/sec Loss 9.0757 LearningRate 0.0738 Epoch: 5 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:35,267-Speed 11275.81 samples/sec Loss 9.1105 LearningRate 0.0738 Epoch: 5 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:36,197-Speed 11028.05 samples/sec Loss 8.9639 LearningRate 0.0738 Epoch: 5 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:37,115-Speed 11169.63 samples/sec Loss 9.0849 LearningRate 0.0738 Epoch: 5 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:38,043-Speed 11038.83 samples/sec Loss 9.1262 LearningRate 0.0738 Epoch: 5 Global Step: 32100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:39,006-Speed 10642.38 samples/sec Loss 8.9397 LearningRate 0.0738 Epoch: 5 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:39,938-Speed 10991.75 samples/sec Loss 8.9831 LearningRate 0.0737 Epoch: 5 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:40,838-Speed 11392.50 samples/sec Loss 8.9441 LearningRate 0.0737 Epoch: 5 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:41,741-Speed 11341.88 samples/sec Loss 9.1847 LearningRate 0.0737 Epoch: 5 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:42,618-Speed 11685.01 samples/sec Loss 8.9634 LearningRate 0.0737 Epoch: 5 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:43,536-Speed 11163.38 samples/sec Loss 9.1538 LearningRate 0.0737 Epoch: 5 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:44,437-Speed 11376.54 samples/sec Loss 9.2605 LearningRate 0.0737 Epoch: 5 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:45,315-Speed 11670.77 samples/sec Loss 9.0373 LearningRate 0.0737 Epoch: 5 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:46,183-Speed 11796.80 samples/sec Loss 9.1368 LearningRate 0.0737 Epoch: 5 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:47,067-Speed 11600.33 samples/sec Loss 9.1358 LearningRate 0.0737 Epoch: 5 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:47,986-Speed 11141.74 samples/sec Loss 9.2512 LearningRate 0.0737 Epoch: 5 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:48,866-Speed 11655.15 samples/sec Loss 9.0623 LearningRate 0.0737 Epoch: 5 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:49,754-Speed 11535.23 samples/sec Loss 9.1092 LearningRate 0.0737 Epoch: 5 Global Step: 32230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:50,626-Speed 11753.96 samples/sec Loss 8.9660 LearningRate 0.0737 Epoch: 5 Global Step: 32240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:51,543-Speed 11173.29 samples/sec Loss 9.1326 LearningRate 0.0737 Epoch: 5 Global Step: 32250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:52,480-Speed 10940.52 samples/sec Loss 9.0835 LearningRate 0.0736 Epoch: 5 Global Step: 32260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:53,391-Speed 11265.56 samples/sec Loss 9.0823 LearningRate 0.0736 Epoch: 5 Global Step: 32270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:54,298-Speed 11294.29 samples/sec Loss 9.0322 LearningRate 0.0736 Epoch: 5 Global Step: 32280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:55,187-Speed 11530.11 samples/sec Loss 9.0303 LearningRate 0.0736 Epoch: 5 Global Step: 32290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:56,099-Speed 11231.50 samples/sec Loss 9.0176 LearningRate 0.0736 Epoch: 5 Global Step: 32300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:56,989-Speed 11520.81 samples/sec Loss 9.0178 LearningRate 0.0736 Epoch: 5 Global Step: 32310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:57,902-Speed 11226.25 samples/sec Loss 9.1451 LearningRate 0.0736 Epoch: 5 Global Step: 32320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:23:58,836-Speed 10962.14 samples/sec Loss 9.1315 LearningRate 0.0736 Epoch: 5 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:23:59,749-Speed 11233.51 samples/sec Loss 9.1022 LearningRate 0.0736 Epoch: 5 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:00,659-Speed 11258.23 samples/sec Loss 9.0852 LearningRate 0.0736 Epoch: 5 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:01,541-Speed 11623.77 samples/sec Loss 8.9800 LearningRate 0.0736 Epoch: 5 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:02,472-Speed 11000.35 samples/sec Loss 9.1064 LearningRate 0.0736 Epoch: 5 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:03,397-Speed 11075.43 samples/sec Loss 8.8980 LearningRate 0.0736 Epoch: 5 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:04,334-Speed 10942.22 samples/sec Loss 8.9965 LearningRate 0.0735 Epoch: 5 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:05,245-Speed 11246.36 samples/sec Loss 9.1017 LearningRate 0.0735 Epoch: 5 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:06,139-Speed 11466.42 samples/sec Loss 9.0274 LearningRate 0.0735 Epoch: 5 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:07,046-Speed 11286.91 samples/sec Loss 9.1873 LearningRate 0.0735 Epoch: 5 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:07,945-Speed 11400.00 samples/sec Loss 9.0779 LearningRate 0.0735 Epoch: 5 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:08,831-Speed 11574.56 samples/sec Loss 8.9336 LearningRate 0.0735 Epoch: 5 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:09,736-Speed 11323.86 samples/sec Loss 8.8935 LearningRate 0.0735 Epoch: 5 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:10,618-Speed 11623.82 samples/sec Loss 9.0005 LearningRate 0.0735 Epoch: 5 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:11,547-Speed 11026.57 samples/sec Loss 8.9084 LearningRate 0.0735 Epoch: 5 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:12,459-Speed 11230.77 samples/sec Loss 9.0367 LearningRate 0.0735 Epoch: 5 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:13,355-Speed 11444.17 samples/sec Loss 9.0066 LearningRate 0.0735 Epoch: 5 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:14,299-Speed 10852.75 samples/sec Loss 9.0218 LearningRate 0.0735 Epoch: 5 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:15,214-Speed 11204.06 samples/sec Loss 9.1863 LearningRate 0.0735 Epoch: 5 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:16,144-Speed 11007.23 samples/sec Loss 9.2079 LearningRate 0.0734 Epoch: 5 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:17,053-Speed 11284.67 samples/sec Loss 8.9695 LearningRate 0.0734 Epoch: 5 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:17,933-Speed 11637.02 samples/sec Loss 8.9516 LearningRate 0.0734 Epoch: 5 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:18,824-Speed 11503.72 samples/sec Loss 8.8077 LearningRate 0.0734 Epoch: 5 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:19,716-Speed 11491.74 samples/sec Loss 9.0543 LearningRate 0.0734 Epoch: 5 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:20,615-Speed 11395.37 samples/sec Loss 8.9154 LearningRate 0.0734 Epoch: 5 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:21,502-Speed 11549.14 samples/sec Loss 9.0982 LearningRate 0.0734 Epoch: 5 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:22,403-Speed 11373.05 samples/sec Loss 9.0409 LearningRate 0.0734 Epoch: 5 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:23,307-Speed 11343.03 samples/sec Loss 8.8652 LearningRate 0.0734 Epoch: 5 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:24,167-Speed 11910.14 samples/sec Loss 9.0422 LearningRate 0.0734 Epoch: 5 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:25,091-Speed 11092.47 samples/sec Loss 9.1657 LearningRate 0.0734 Epoch: 5 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:25,990-Speed 11388.16 samples/sec Loss 9.0656 LearningRate 0.0734 Epoch: 5 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:26,894-Speed 11342.74 samples/sec Loss 8.9578 LearningRate 0.0734 Epoch: 5 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:27,797-Speed 11345.77 samples/sec Loss 9.0674 LearningRate 0.0733 Epoch: 5 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:28,674-Speed 11682.11 samples/sec Loss 9.0019 LearningRate 0.0733 Epoch: 5 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:29,587-Speed 11220.31 samples/sec Loss 8.9577 LearningRate 0.0733 Epoch: 5 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:30,479-Speed 11495.93 samples/sec Loss 9.0736 LearningRate 0.0733 Epoch: 5 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:31,375-Speed 11433.87 samples/sec Loss 8.9114 LearningRate 0.0733 Epoch: 5 Global Step: 32690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:32,243-Speed 11804.85 samples/sec Loss 9.1364 LearningRate 0.0733 Epoch: 5 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:33,126-Speed 11609.36 samples/sec Loss 9.1614 LearningRate 0.0733 Epoch: 5 Global Step: 32710 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:24:33,994-Speed 11809.67 samples/sec Loss 8.8643 LearningRate 0.0733 Epoch: 5 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:34,891-Speed 11416.11 samples/sec Loss 9.0221 LearningRate 0.0733 Epoch: 5 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:35,792-Speed 11395.67 samples/sec Loss 9.0297 LearningRate 0.0733 Epoch: 5 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:36,721-Speed 11023.73 samples/sec Loss 9.1459 LearningRate 0.0733 Epoch: 5 Global Step: 32750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:37,596-Speed 11709.39 samples/sec Loss 9.0057 LearningRate 0.0733 Epoch: 5 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:38,503-Speed 11304.31 samples/sec Loss 9.0477 LearningRate 0.0733 Epoch: 5 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:39,415-Speed 11239.28 samples/sec Loss 9.1615 LearningRate 0.0733 Epoch: 5 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:40,354-Speed 10912.32 samples/sec Loss 8.9103 LearningRate 0.0732 Epoch: 5 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:41,284-Speed 11010.67 samples/sec Loss 9.0723 LearningRate 0.0732 Epoch: 5 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:42,155-Speed 11775.17 samples/sec Loss 9.0213 LearningRate 0.0732 Epoch: 5 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:43,047-Speed 11483.27 samples/sec Loss 9.0378 LearningRate 0.0732 Epoch: 5 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:43,935-Speed 11543.20 samples/sec Loss 8.9714 LearningRate 0.0732 Epoch: 5 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:44,857-Speed 11110.65 samples/sec Loss 9.0131 LearningRate 0.0732 Epoch: 5 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:45,734-Speed 11684.34 samples/sec Loss 9.0480 LearningRate 0.0732 Epoch: 5 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:46,609-Speed 11717.46 samples/sec Loss 9.0946 LearningRate 0.0732 Epoch: 5 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:47,487-Speed 11666.22 samples/sec Loss 8.9075 LearningRate 0.0732 Epoch: 5 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:48,405-Speed 11158.88 samples/sec Loss 8.8692 LearningRate 0.0732 Epoch: 5 Global Step: 32880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:49,316-Speed 11253.12 samples/sec Loss 9.1863 LearningRate 0.0732 Epoch: 5 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:50,200-Speed 11585.66 samples/sec Loss 9.0603 LearningRate 0.0732 Epoch: 5 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:24:51,078-Speed 11673.62 samples/sec Loss 8.9893 LearningRate 0.0732 Epoch: 5 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:51,987-Speed 11274.40 samples/sec Loss 8.9944 LearningRate 0.0731 Epoch: 5 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:52,882-Speed 11451.36 samples/sec Loss 8.9518 LearningRate 0.0731 Epoch: 5 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:53,792-Speed 11257.71 samples/sec Loss 9.0186 LearningRate 0.0731 Epoch: 5 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:54,695-Speed 11346.73 samples/sec Loss 8.8277 LearningRate 0.0731 Epoch: 5 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:55,611-Speed 11197.79 samples/sec Loss 9.1290 LearningRate 0.0731 Epoch: 5 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:56,503-Speed 11476.51 samples/sec Loss 9.0799 LearningRate 0.0731 Epoch: 5 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:57,379-Speed 11701.66 samples/sec Loss 9.0383 LearningRate 0.0731 Epoch: 5 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:58,254-Speed 11717.09 samples/sec Loss 9.2099 LearningRate 0.0731 Epoch: 5 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:24:59,126-Speed 11752.44 samples/sec Loss 8.8120 LearningRate 0.0731 Epoch: 5 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:00,037-Speed 11241.93 samples/sec Loss 8.8425 LearningRate 0.0731 Epoch: 5 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:00,959-Speed 11115.38 samples/sec Loss 8.9538 LearningRate 0.0731 Epoch: 5 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:01,859-Speed 11396.33 samples/sec Loss 9.0422 LearningRate 0.0731 Epoch: 5 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:02,757-Speed 11403.16 samples/sec Loss 9.0111 LearningRate 0.0731 Epoch: 5 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:03,683-Speed 11064.82 samples/sec Loss 8.9604 LearningRate 0.0730 Epoch: 5 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:04,567-Speed 11601.27 samples/sec Loss 9.0291 LearningRate 0.0730 Epoch: 5 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:05,474-Speed 11305.86 samples/sec Loss 9.0368 LearningRate 0.0730 Epoch: 5 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:06,398-Speed 11094.39 samples/sec Loss 9.0049 LearningRate 0.0730 Epoch: 5 Global Step: 33080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:07,299-Speed 11380.40 samples/sec Loss 9.1620 LearningRate 0.0730 Epoch: 5 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:08,231-Speed 10991.96 samples/sec Loss 8.9018 LearningRate 0.0730 Epoch: 5 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:09,125-Speed 11459.03 samples/sec Loss 9.1254 LearningRate 0.0730 Epoch: 5 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:10,021-Speed 11429.25 samples/sec Loss 8.8373 LearningRate 0.0730 Epoch: 5 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:10,900-Speed 11662.39 samples/sec Loss 9.0275 LearningRate 0.0730 Epoch: 5 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:11,776-Speed 11698.26 samples/sec Loss 8.9099 LearningRate 0.0730 Epoch: 5 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:12,669-Speed 11478.33 samples/sec Loss 9.0393 LearningRate 0.0730 Epoch: 5 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:13,578-Speed 11271.14 samples/sec Loss 9.0242 LearningRate 0.0730 Epoch: 5 Global Step: 33160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:14,474-Speed 11435.84 samples/sec Loss 8.9898 LearningRate 0.0730 Epoch: 5 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:15,396-Speed 11117.81 samples/sec Loss 8.7936 LearningRate 0.0730 Epoch: 5 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:16,317-Speed 11123.65 samples/sec Loss 9.0222 LearningRate 0.0729 Epoch: 5 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:17,238-Speed 11130.36 samples/sec Loss 9.1196 LearningRate 0.0729 Epoch: 5 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:18,175-Speed 10938.08 samples/sec Loss 8.9782 LearningRate 0.0729 Epoch: 5 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:19,100-Speed 11078.15 samples/sec Loss 9.1063 LearningRate 0.0729 Epoch: 5 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:20,001-Speed 11370.26 samples/sec Loss 8.9589 LearningRate 0.0729 Epoch: 5 Global Step: 33230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:20,870-Speed 11794.94 samples/sec Loss 8.9292 LearningRate 0.0729 Epoch: 5 Global Step: 33240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:21,739-Speed 11780.91 samples/sec Loss 8.8421 LearningRate 0.0729 Epoch: 5 Global Step: 33250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:22,612-Speed 11742.81 samples/sec Loss 9.1447 LearningRate 0.0729 Epoch: 5 Global Step: 33260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:23,510-Speed 11415.10 samples/sec Loss 8.9285 LearningRate 0.0729 Epoch: 5 Global Step: 33270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:24,413-Speed 11337.56 samples/sec Loss 9.0664 LearningRate 0.0729 Epoch: 5 Global Step: 33280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:25,328-Speed 11200.31 samples/sec Loss 8.8630 LearningRate 0.0729 Epoch: 5 Global Step: 33290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:26,234-Speed 11315.16 samples/sec Loss 8.9957 LearningRate 0.0729 Epoch: 5 Global Step: 33300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:27,150-Speed 11183.88 samples/sec Loss 8.9575 LearningRate 0.0729 Epoch: 5 Global Step: 33310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:28,057-Speed 11299.53 samples/sec Loss 9.2102 LearningRate 0.0728 Epoch: 5 Global Step: 33320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:28,957-Speed 11385.51 samples/sec Loss 8.8735 LearningRate 0.0728 Epoch: 5 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:29,877-Speed 11134.22 samples/sec Loss 8.9709 LearningRate 0.0728 Epoch: 5 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:30,814-Speed 10941.99 samples/sec Loss 8.9667 LearningRate 0.0728 Epoch: 5 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:31,734-Speed 11131.39 samples/sec Loss 8.8681 LearningRate 0.0728 Epoch: 5 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:32,644-Speed 11265.71 samples/sec Loss 8.9541 LearningRate 0.0728 Epoch: 5 Global Step: 33370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:33,535-Speed 11494.45 samples/sec Loss 9.0528 LearningRate 0.0728 Epoch: 5 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:34,420-Speed 11588.34 samples/sec Loss 8.9694 LearningRate 0.0728 Epoch: 5 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:35,327-Speed 11291.74 samples/sec Loss 8.9662 LearningRate 0.0728 Epoch: 5 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:36,230-Speed 11355.89 samples/sec Loss 8.9353 LearningRate 0.0728 Epoch: 5 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:37,132-Speed 11353.15 samples/sec Loss 8.9638 LearningRate 0.0728 Epoch: 5 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:38,015-Speed 11609.37 samples/sec Loss 9.0174 LearningRate 0.0728 Epoch: 5 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:38,936-Speed 11129.25 samples/sec Loss 8.9366 LearningRate 0.0728 Epoch: 5 Global Step: 33440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:39,808-Speed 11761.03 samples/sec Loss 9.0840 LearningRate 0.0727 Epoch: 5 Global Step: 33450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:40,719-Speed 11250.45 samples/sec Loss 9.0202 LearningRate 0.0727 Epoch: 5 Global Step: 33460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:41,629-Speed 11248.39 samples/sec Loss 9.0026 LearningRate 0.0727 Epoch: 5 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:42,517-Speed 11549.53 samples/sec Loss 9.0226 LearningRate 0.0727 Epoch: 5 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:43,406-Speed 11525.75 samples/sec Loss 8.9558 LearningRate 0.0727 Epoch: 5 Global Step: 33490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:44,340-Speed 10969.14 samples/sec Loss 9.1399 LearningRate 0.0727 Epoch: 5 Global Step: 33500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:45,235-Speed 11453.05 samples/sec Loss 9.0338 LearningRate 0.0727 Epoch: 5 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:46,137-Speed 11349.99 samples/sec Loss 8.9885 LearningRate 0.0727 Epoch: 5 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:47,047-Speed 11267.83 samples/sec Loss 8.9432 LearningRate 0.0727 Epoch: 5 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:47,964-Speed 11172.39 samples/sec Loss 9.0822 LearningRate 0.0727 Epoch: 5 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:48,857-Speed 11481.31 samples/sec Loss 8.8798 LearningRate 0.0727 Epoch: 5 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:49,761-Speed 11329.29 samples/sec Loss 9.1030 LearningRate 0.0727 Epoch: 5 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:50,693-Speed 10997.72 samples/sec Loss 8.8773 LearningRate 0.0727 Epoch: 5 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:51,629-Speed 10942.97 samples/sec Loss 9.0577 LearningRate 0.0727 Epoch: 5 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:52,575-Speed 10832.31 samples/sec Loss 9.0795 LearningRate 0.0726 Epoch: 5 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:53,495-Speed 11141.50 samples/sec Loss 8.9040 LearningRate 0.0726 Epoch: 5 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:54,414-Speed 11144.12 samples/sec Loss 8.8854 LearningRate 0.0726 Epoch: 5 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:55,361-Speed 10827.68 samples/sec Loss 8.9235 LearningRate 0.0726 Epoch: 5 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:56,292-Speed 10998.79 samples/sec Loss 8.9497 LearningRate 0.0726 Epoch: 5 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:57,192-Speed 11383.12 samples/sec Loss 9.0495 LearningRate 0.0726 Epoch: 5 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:58,071-Speed 11665.51 samples/sec Loss 9.1006 LearningRate 0.0726 Epoch: 5 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:25:58,951-Speed 11645.11 samples/sec Loss 9.0116 LearningRate 0.0726 Epoch: 5 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:25:59,830-Speed 11670.31 samples/sec Loss 9.0066 LearningRate 0.0726 Epoch: 5 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:00,758-Speed 11042.23 samples/sec Loss 9.0315 LearningRate 0.0726 Epoch: 5 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:01,641-Speed 11597.85 samples/sec Loss 8.9505 LearningRate 0.0726 Epoch: 5 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:02,528-Speed 11556.08 samples/sec Loss 9.0643 LearningRate 0.0726 Epoch: 5 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:03,413-Speed 11582.29 samples/sec Loss 8.9259 LearningRate 0.0726 Epoch: 5 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:04,332-Speed 11144.89 samples/sec Loss 8.8681 LearningRate 0.0725 Epoch: 5 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:05,282-Speed 10788.53 samples/sec Loss 8.9740 LearningRate 0.0725 Epoch: 5 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:06,193-Speed 11249.41 samples/sec Loss 8.9674 LearningRate 0.0725 Epoch: 5 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:07,061-Speed 11803.78 samples/sec Loss 8.8052 LearningRate 0.0725 Epoch: 5 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:07,959-Speed 11409.85 samples/sec Loss 8.8674 LearningRate 0.0725 Epoch: 5 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:08,836-Speed 11683.20 samples/sec Loss 8.8295 LearningRate 0.0725 Epoch: 5 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:09,701-Speed 11845.11 samples/sec Loss 8.8795 LearningRate 0.0725 Epoch: 5 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:10,593-Speed 11486.69 samples/sec Loss 8.9627 LearningRate 0.0725 Epoch: 5 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:11,462-Speed 11796.88 samples/sec Loss 9.1075 LearningRate 0.0725 Epoch: 5 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:12,388-Speed 11058.83 samples/sec Loss 9.0104 LearningRate 0.0725 Epoch: 5 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:13,292-Speed 11343.54 samples/sec Loss 8.8659 LearningRate 0.0725 Epoch: 5 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:14,185-Speed 11468.91 samples/sec Loss 9.0077 LearningRate 0.0725 Epoch: 5 Global Step: 33830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:15,104-Speed 11148.60 samples/sec Loss 8.8405 LearningRate 0.0725 Epoch: 5 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:16,056-Speed 10765.93 samples/sec Loss 8.9566 LearningRate 0.0724 Epoch: 5 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:16,974-Speed 11161.52 samples/sec Loss 8.8775 LearningRate 0.0724 Epoch: 5 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:17,900-Speed 11074.82 samples/sec Loss 8.8901 LearningRate 0.0724 Epoch: 5 Global Step: 33870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:18,854-Speed 10735.94 samples/sec Loss 9.0274 LearningRate 0.0724 Epoch: 5 Global Step: 33880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:19,781-Speed 11057.41 samples/sec Loss 9.1343 LearningRate 0.0724 Epoch: 5 Global Step: 33890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:20,682-Speed 11374.67 samples/sec Loss 8.9677 LearningRate 0.0724 Epoch: 5 Global Step: 33900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:21,578-Speed 11441.09 samples/sec Loss 8.9660 LearningRate 0.0724 Epoch: 5 Global Step: 33910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:22,477-Speed 11399.62 samples/sec Loss 9.1689 LearningRate 0.0724 Epoch: 5 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:23,348-Speed 11765.61 samples/sec Loss 8.9727 LearningRate 0.0724 Epoch: 5 Global Step: 33930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:24,278-Speed 11006.70 samples/sec Loss 8.9849 LearningRate 0.0724 Epoch: 5 Global Step: 33940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:25,191-Speed 11225.74 samples/sec Loss 9.0049 LearningRate 0.0724 Epoch: 5 Global Step: 33950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:26:26,143-Speed 10770.48 samples/sec Loss 8.9608 LearningRate 0.0724 Epoch: 5 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:27,089-Speed 10829.65 samples/sec Loss 8.9161 LearningRate 0.0724 Epoch: 5 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:28,027-Speed 10919.82 samples/sec Loss 9.0276 LearningRate 0.0724 Epoch: 5 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:28,904-Speed 11686.19 samples/sec Loss 9.1001 LearningRate 0.0723 Epoch: 5 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:29,801-Speed 11424.80 samples/sec Loss 8.9951 LearningRate 0.0723 Epoch: 5 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:26:51,161-[lfw][34000]XNorm: 14.135527 Training: 2022-04-11 00:26:51,162-[lfw][34000]Accuracy-Flip: 0.99400+-0.00429 Training: 2022-04-11 00:26:51,162-[lfw][34000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:27:16,077-[cfp_fp][34000]XNorm: 12.004922 Training: 2022-04-11 00:27:16,078-[cfp_fp][34000]Accuracy-Flip: 0.92471+-0.01297 Training: 2022-04-11 00:27:16,078-[cfp_fp][34000]Accuracy-Highest: 0.92471 Training: 2022-04-11 00:27:37,448-[agedb_30][34000]XNorm: 13.882606 Training: 2022-04-11 00:27:37,448-[agedb_30][34000]Accuracy-Flip: 0.95000+-0.01108 Training: 2022-04-11 00:27:37,448-[agedb_30][34000]Accuracy-Highest: 0.95233 Training: 2022-04-11 00:27:38,342-Speed 149.40 samples/sec Loss 8.8728 LearningRate 0.0723 Epoch: 5 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:27:39,219-Speed 11687.11 samples/sec Loss 9.0034 LearningRate 0.0723 Epoch: 5 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:27:40,125-Speed 11305.59 samples/sec Loss 8.8822 LearningRate 0.0723 Epoch: 5 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:27:41,022-Speed 11437.63 samples/sec Loss 8.8740 LearningRate 0.0723 Epoch: 5 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:41,913-Speed 11503.07 samples/sec Loss 8.9620 LearningRate 0.0723 Epoch: 5 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:42,810-Speed 11421.03 samples/sec Loss 8.9148 LearningRate 0.0723 Epoch: 5 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:43,718-Speed 11285.78 samples/sec Loss 9.0606 LearningRate 0.0723 Epoch: 5 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:44,624-Speed 11316.27 samples/sec Loss 9.0566 LearningRate 0.0723 Epoch: 5 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:45,514-Speed 11517.94 samples/sec Loss 8.9459 LearningRate 0.0723 Epoch: 5 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:46,391-Speed 11677.93 samples/sec Loss 8.9151 LearningRate 0.0723 Epoch: 5 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:47,449-Speed 9683.33 samples/sec Loss 9.1087 LearningRate 0.0723 Epoch: 5 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:57,648-Speed 1004.20 samples/sec Loss 8.6947 LearningRate 0.0722 Epoch: 6 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:58,620-Speed 10541.48 samples/sec Loss 8.2364 LearningRate 0.0722 Epoch: 6 Global Step: 34130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:27:59,597-Speed 10484.65 samples/sec Loss 8.0941 LearningRate 0.0722 Epoch: 6 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:00,595-Speed 10271.57 samples/sec Loss 8.1552 LearningRate 0.0722 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:01,617-Speed 10025.78 samples/sec Loss 8.0209 LearningRate 0.0722 Epoch: 6 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:02,508-Speed 11500.14 samples/sec Loss 8.1634 LearningRate 0.0722 Epoch: 6 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:03,458-Speed 10790.07 samples/sec Loss 8.1475 LearningRate 0.0722 Epoch: 6 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:04,352-Speed 11453.95 samples/sec Loss 7.9925 LearningRate 0.0722 Epoch: 6 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:05,235-Speed 11607.88 samples/sec Loss 8.2691 LearningRate 0.0722 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:06,145-Speed 11272.27 samples/sec Loss 8.1706 LearningRate 0.0722 Epoch: 6 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:07,030-Speed 11571.00 samples/sec Loss 8.3269 LearningRate 0.0722 Epoch: 6 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:07,935-Speed 11325.68 samples/sec Loss 8.3238 LearningRate 0.0722 Epoch: 6 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:09,111-Speed 8716.58 samples/sec Loss 8.2373 LearningRate 0.0722 Epoch: 6 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:10,038-Speed 11044.79 samples/sec Loss 8.2352 LearningRate 0.0721 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:10,963-Speed 11079.97 samples/sec Loss 8.3705 LearningRate 0.0721 Epoch: 6 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:11,908-Speed 10842.54 samples/sec Loss 8.2403 LearningRate 0.0721 Epoch: 6 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:12,842-Speed 10970.22 samples/sec Loss 8.3130 LearningRate 0.0721 Epoch: 6 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:13,747-Speed 11330.42 samples/sec Loss 8.2566 LearningRate 0.0721 Epoch: 6 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:14,687-Speed 10904.65 samples/sec Loss 8.3716 LearningRate 0.0721 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:15,651-Speed 10625.51 samples/sec Loss 8.3688 LearningRate 0.0721 Epoch: 6 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:16,584-Speed 10987.72 samples/sec Loss 8.3598 LearningRate 0.0721 Epoch: 6 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:17,565-Speed 10444.91 samples/sec Loss 8.5278 LearningRate 0.0721 Epoch: 6 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:18,530-Speed 10617.80 samples/sec Loss 8.1733 LearningRate 0.0721 Epoch: 6 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:19,522-Speed 10323.02 samples/sec Loss 8.2644 LearningRate 0.0721 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:20,469-Speed 10820.70 samples/sec Loss 8.3235 LearningRate 0.0721 Epoch: 6 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:21,395-Speed 11074.68 samples/sec Loss 8.4758 LearningRate 0.0721 Epoch: 6 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:22,385-Speed 10351.05 samples/sec Loss 8.4684 LearningRate 0.0721 Epoch: 6 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:23,307-Speed 11107.95 samples/sec Loss 8.3899 LearningRate 0.0720 Epoch: 6 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:24,239-Speed 10989.25 samples/sec Loss 8.3943 LearningRate 0.0720 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:25,219-Speed 10455.48 samples/sec Loss 8.3830 LearningRate 0.0720 Epoch: 6 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:26,147-Speed 11049.53 samples/sec Loss 8.1921 LearningRate 0.0720 Epoch: 6 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:27,084-Speed 10934.98 samples/sec Loss 8.2522 LearningRate 0.0720 Epoch: 6 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:28,014-Speed 11010.34 samples/sec Loss 8.3207 LearningRate 0.0720 Epoch: 6 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:28,907-Speed 11475.33 samples/sec Loss 8.5165 LearningRate 0.0720 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:29,832-Speed 11076.53 samples/sec Loss 8.5516 LearningRate 0.0720 Epoch: 6 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:30,723-Speed 11499.52 samples/sec Loss 8.3617 LearningRate 0.0720 Epoch: 6 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:31,608-Speed 11584.14 samples/sec Loss 8.4836 LearningRate 0.0720 Epoch: 6 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:32,527-Speed 11156.70 samples/sec Loss 8.5485 LearningRate 0.0720 Epoch: 6 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:33,409-Speed 11612.31 samples/sec Loss 8.2978 LearningRate 0.0720 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:34,318-Speed 11280.37 samples/sec Loss 8.5575 LearningRate 0.0720 Epoch: 6 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:35,225-Speed 11304.39 samples/sec Loss 8.4120 LearningRate 0.0719 Epoch: 6 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:36,158-Speed 10977.17 samples/sec Loss 8.3903 LearningRate 0.0719 Epoch: 6 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:37,055-Speed 11424.96 samples/sec Loss 8.4040 LearningRate 0.0719 Epoch: 6 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:37,940-Speed 11574.85 samples/sec Loss 8.6028 LearningRate 0.0719 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:38,849-Speed 11282.78 samples/sec Loss 8.5800 LearningRate 0.0719 Epoch: 6 Global Step: 34560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:39,757-Speed 11283.41 samples/sec Loss 8.5583 LearningRate 0.0719 Epoch: 6 Global Step: 34570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:40,659-Speed 11358.40 samples/sec Loss 8.5794 LearningRate 0.0719 Epoch: 6 Global Step: 34580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:41,608-Speed 10794.64 samples/sec Loss 8.4644 LearningRate 0.0719 Epoch: 6 Global Step: 34590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:42,543-Speed 10955.25 samples/sec Loss 8.4718 LearningRate 0.0719 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:43,460-Speed 11177.89 samples/sec Loss 8.5756 LearningRate 0.0719 Epoch: 6 Global Step: 34610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:44,367-Speed 11298.41 samples/sec Loss 8.5679 LearningRate 0.0719 Epoch: 6 Global Step: 34620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:45,260-Speed 11466.69 samples/sec Loss 8.5062 LearningRate 0.0719 Epoch: 6 Global Step: 34630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:46,163-Speed 11352.87 samples/sec Loss 8.4942 LearningRate 0.0719 Epoch: 6 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:28:47,100-Speed 10935.31 samples/sec Loss 8.5725 LearningRate 0.0719 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:48,024-Speed 11100.35 samples/sec Loss 8.4812 LearningRate 0.0718 Epoch: 6 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:48,933-Speed 11267.34 samples/sec Loss 8.4208 LearningRate 0.0718 Epoch: 6 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:49,856-Speed 11105.54 samples/sec Loss 8.5366 LearningRate 0.0718 Epoch: 6 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:50,763-Speed 11297.29 samples/sec Loss 8.4982 LearningRate 0.0718 Epoch: 6 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:51,684-Speed 11126.16 samples/sec Loss 8.6434 LearningRate 0.0718 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:52,589-Speed 11331.64 samples/sec Loss 8.6221 LearningRate 0.0718 Epoch: 6 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:53,496-Speed 11287.08 samples/sec Loss 8.5189 LearningRate 0.0718 Epoch: 6 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:54,396-Speed 11394.42 samples/sec Loss 8.5565 LearningRate 0.0718 Epoch: 6 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:55,291-Speed 11444.95 samples/sec Loss 8.5859 LearningRate 0.0718 Epoch: 6 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:56,160-Speed 11789.26 samples/sec Loss 8.7290 LearningRate 0.0718 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:57,038-Speed 11679.77 samples/sec Loss 8.7297 LearningRate 0.0718 Epoch: 6 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:57,943-Speed 11318.84 samples/sec Loss 8.5828 LearningRate 0.0718 Epoch: 6 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:58,875-Speed 10993.62 samples/sec Loss 8.5831 LearningRate 0.0718 Epoch: 6 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:28:59,797-Speed 11112.94 samples/sec Loss 8.6780 LearningRate 0.0717 Epoch: 6 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:00,742-Speed 10838.48 samples/sec Loss 8.6793 LearningRate 0.0717 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:01,670-Speed 11051.33 samples/sec Loss 8.4292 LearningRate 0.0717 Epoch: 6 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:02,613-Speed 10861.82 samples/sec Loss 8.5434 LearningRate 0.0717 Epoch: 6 Global Step: 34820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:03,549-Speed 10954.38 samples/sec Loss 8.5567 LearningRate 0.0717 Epoch: 6 Global Step: 34830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:04,515-Speed 10601.11 samples/sec Loss 8.5399 LearningRate 0.0717 Epoch: 6 Global Step: 34840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:05,452-Speed 10939.53 samples/sec Loss 8.7525 LearningRate 0.0717 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:29:06,384-Speed 10996.67 samples/sec Loss 8.5997 LearningRate 0.0717 Epoch: 6 Global Step: 34860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:07,278-Speed 11456.22 samples/sec Loss 8.5700 LearningRate 0.0717 Epoch: 6 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:08,209-Speed 11005.95 samples/sec Loss 8.4657 LearningRate 0.0717 Epoch: 6 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:09,098-Speed 11528.19 samples/sec Loss 8.5681 LearningRate 0.0717 Epoch: 6 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:10,017-Speed 11155.54 samples/sec Loss 8.6980 LearningRate 0.0717 Epoch: 6 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:10,917-Speed 11383.05 samples/sec Loss 8.5377 LearningRate 0.0717 Epoch: 6 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:11,821-Speed 11331.15 samples/sec Loss 8.5596 LearningRate 0.0717 Epoch: 6 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:12,750-Speed 11037.19 samples/sec Loss 8.6283 LearningRate 0.0716 Epoch: 6 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:13,702-Speed 10765.00 samples/sec Loss 8.6483 LearningRate 0.0716 Epoch: 6 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:14,587-Speed 11577.24 samples/sec Loss 8.6926 LearningRate 0.0716 Epoch: 6 Global Step: 34950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:15,484-Speed 11418.02 samples/sec Loss 8.6866 LearningRate 0.0716 Epoch: 6 Global Step: 34960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:16,381-Speed 11429.88 samples/sec Loss 8.5912 LearningRate 0.0716 Epoch: 6 Global Step: 34970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:17,314-Speed 10977.45 samples/sec Loss 8.6840 LearningRate 0.0716 Epoch: 6 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:18,258-Speed 10862.82 samples/sec Loss 8.5939 LearningRate 0.0716 Epoch: 6 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:19,199-Speed 10884.06 samples/sec Loss 8.8702 LearningRate 0.0716 Epoch: 6 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:20,132-Speed 10981.31 samples/sec Loss 8.5683 LearningRate 0.0716 Epoch: 6 Global Step: 35010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:21,053-Speed 11120.38 samples/sec Loss 8.6484 LearningRate 0.0716 Epoch: 6 Global Step: 35020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:21,947-Speed 11474.40 samples/sec Loss 8.6906 LearningRate 0.0716 Epoch: 6 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:22,838-Speed 11496.06 samples/sec Loss 8.6218 LearningRate 0.0716 Epoch: 6 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:23,748-Speed 11262.22 samples/sec Loss 8.5871 LearningRate 0.0716 Epoch: 6 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:24,630-Speed 11611.55 samples/sec Loss 8.6092 LearningRate 0.0715 Epoch: 6 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:25,566-Speed 10952.15 samples/sec Loss 8.5985 LearningRate 0.0715 Epoch: 6 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:26,514-Speed 10812.06 samples/sec Loss 8.6448 LearningRate 0.0715 Epoch: 6 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:27,464-Speed 10789.10 samples/sec Loss 8.7421 LearningRate 0.0715 Epoch: 6 Global Step: 35090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:28,407-Speed 10865.12 samples/sec Loss 8.6448 LearningRate 0.0715 Epoch: 6 Global Step: 35100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:29,336-Speed 11021.25 samples/sec Loss 8.6956 LearningRate 0.0715 Epoch: 6 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:30,289-Speed 10759.71 samples/sec Loss 8.8063 LearningRate 0.0715 Epoch: 6 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:31,210-Speed 11123.99 samples/sec Loss 8.8018 LearningRate 0.0715 Epoch: 6 Global Step: 35130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:32,130-Speed 11149.70 samples/sec Loss 8.8762 LearningRate 0.0715 Epoch: 6 Global Step: 35140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:33,022-Speed 11479.26 samples/sec Loss 8.7566 LearningRate 0.0715 Epoch: 6 Global Step: 35150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:33,915-Speed 11482.95 samples/sec Loss 8.6368 LearningRate 0.0715 Epoch: 6 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:34,819-Speed 11327.71 samples/sec Loss 8.6486 LearningRate 0.0715 Epoch: 6 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:35,750-Speed 11014.39 samples/sec Loss 8.5767 LearningRate 0.0715 Epoch: 6 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:36,667-Speed 11178.67 samples/sec Loss 8.6747 LearningRate 0.0714 Epoch: 6 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:37,619-Speed 10763.14 samples/sec Loss 8.7220 LearningRate 0.0714 Epoch: 6 Global Step: 35200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:38,500-Speed 11624.72 samples/sec Loss 8.7210 LearningRate 0.0714 Epoch: 6 Global Step: 35210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:39,403-Speed 11352.89 samples/sec Loss 8.6969 LearningRate 0.0714 Epoch: 6 Global Step: 35220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:40,287-Speed 11585.24 samples/sec Loss 8.6429 LearningRate 0.0714 Epoch: 6 Global Step: 35230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:41,181-Speed 11463.40 samples/sec Loss 8.7116 LearningRate 0.0714 Epoch: 6 Global Step: 35240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:42,083-Speed 11355.73 samples/sec Loss 8.6430 LearningRate 0.0714 Epoch: 6 Global Step: 35250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:42,996-Speed 11231.09 samples/sec Loss 8.7974 LearningRate 0.0714 Epoch: 6 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:43,903-Speed 11297.97 samples/sec Loss 8.7967 LearningRate 0.0714 Epoch: 6 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:44,817-Speed 11206.29 samples/sec Loss 8.7482 LearningRate 0.0714 Epoch: 6 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:45,721-Speed 11341.34 samples/sec Loss 8.6995 LearningRate 0.0714 Epoch: 6 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:46,616-Speed 11444.50 samples/sec Loss 8.7018 LearningRate 0.0714 Epoch: 6 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:47,574-Speed 10695.04 samples/sec Loss 8.7383 LearningRate 0.0714 Epoch: 6 Global Step: 35310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:48,508-Speed 10971.47 samples/sec Loss 8.7722 LearningRate 0.0714 Epoch: 6 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:49,410-Speed 11368.50 samples/sec Loss 8.7184 LearningRate 0.0713 Epoch: 6 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:50,283-Speed 11729.28 samples/sec Loss 8.6395 LearningRate 0.0713 Epoch: 6 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:51,221-Speed 10927.57 samples/sec Loss 8.8099 LearningRate 0.0713 Epoch: 6 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:52,159-Speed 10962.74 samples/sec Loss 8.7754 LearningRate 0.0713 Epoch: 6 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:53,068-Speed 11278.19 samples/sec Loss 8.5877 LearningRate 0.0713 Epoch: 6 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:54,026-Speed 10694.99 samples/sec Loss 8.5615 LearningRate 0.0713 Epoch: 6 Global Step: 35380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:54,915-Speed 11531.20 samples/sec Loss 8.7757 LearningRate 0.0713 Epoch: 6 Global Step: 35390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:55,823-Speed 11280.07 samples/sec Loss 8.6960 LearningRate 0.0713 Epoch: 6 Global Step: 35400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:56,746-Speed 11103.66 samples/sec Loss 8.6535 LearningRate 0.0713 Epoch: 6 Global Step: 35410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:57,646-Speed 11392.30 samples/sec Loss 8.8003 LearningRate 0.0713 Epoch: 6 Global Step: 35420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:29:58,537-Speed 11490.76 samples/sec Loss 8.7999 LearningRate 0.0713 Epoch: 6 Global Step: 35430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:29:59,426-Speed 11525.61 samples/sec Loss 8.5702 LearningRate 0.0713 Epoch: 6 Global Step: 35440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:00,345-Speed 11151.11 samples/sec Loss 8.6728 LearningRate 0.0713 Epoch: 6 Global Step: 35450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:01,241-Speed 11441.16 samples/sec Loss 8.6666 LearningRate 0.0712 Epoch: 6 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:02,136-Speed 11446.64 samples/sec Loss 8.6567 LearningRate 0.0712 Epoch: 6 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:03,064-Speed 11045.47 samples/sec Loss 8.8668 LearningRate 0.0712 Epoch: 6 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:03,993-Speed 11023.14 samples/sec Loss 8.9688 LearningRate 0.0712 Epoch: 6 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:04,912-Speed 11157.07 samples/sec Loss 8.6821 LearningRate 0.0712 Epoch: 6 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:05,815-Speed 11341.27 samples/sec Loss 8.7428 LearningRate 0.0712 Epoch: 6 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:06,721-Speed 11310.20 samples/sec Loss 8.5819 LearningRate 0.0712 Epoch: 6 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:07,614-Speed 11479.41 samples/sec Loss 8.6514 LearningRate 0.0712 Epoch: 6 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:08,559-Speed 10849.17 samples/sec Loss 8.5737 LearningRate 0.0712 Epoch: 6 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:09,508-Speed 10798.00 samples/sec Loss 8.6286 LearningRate 0.0712 Epoch: 6 Global Step: 35550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:10,415-Speed 11292.11 samples/sec Loss 8.7111 LearningRate 0.0712 Epoch: 6 Global Step: 35560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:11,327-Speed 11235.10 samples/sec Loss 8.6938 LearningRate 0.0712 Epoch: 6 Global Step: 35570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:12,262-Speed 10966.42 samples/sec Loss 8.7345 LearningRate 0.0712 Epoch: 6 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:13,205-Speed 10859.25 samples/sec Loss 8.8338 LearningRate 0.0712 Epoch: 6 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:14,147-Speed 10876.17 samples/sec Loss 8.7351 LearningRate 0.0711 Epoch: 6 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:15,113-Speed 10607.20 samples/sec Loss 8.6694 LearningRate 0.0711 Epoch: 6 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:16,009-Speed 11448.00 samples/sec Loss 8.8961 LearningRate 0.0711 Epoch: 6 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:16,902-Speed 11467.51 samples/sec Loss 8.7368 LearningRate 0.0711 Epoch: 6 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:17,866-Speed 10634.02 samples/sec Loss 8.8326 LearningRate 0.0711 Epoch: 6 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:18,775-Speed 11275.16 samples/sec Loss 8.7564 LearningRate 0.0711 Epoch: 6 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:19,723-Speed 10798.77 samples/sec Loss 8.7328 LearningRate 0.0711 Epoch: 6 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:20,648-Speed 11076.11 samples/sec Loss 8.6231 LearningRate 0.0711 Epoch: 6 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:21,536-Speed 11545.16 samples/sec Loss 8.8124 LearningRate 0.0711 Epoch: 6 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:22,451-Speed 11206.90 samples/sec Loss 8.7505 LearningRate 0.0711 Epoch: 6 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:23,360-Speed 11280.15 samples/sec Loss 8.8482 LearningRate 0.0711 Epoch: 6 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:24,255-Speed 11441.61 samples/sec Loss 8.6442 LearningRate 0.0711 Epoch: 6 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:25,147-Speed 11493.13 samples/sec Loss 8.6904 LearningRate 0.0711 Epoch: 6 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:26,066-Speed 11149.68 samples/sec Loss 8.9292 LearningRate 0.0710 Epoch: 6 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:26,991-Speed 11073.36 samples/sec Loss 8.8171 LearningRate 0.0710 Epoch: 6 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:27,914-Speed 11103.56 samples/sec Loss 8.6802 LearningRate 0.0710 Epoch: 6 Global Step: 35750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:28,827-Speed 11228.45 samples/sec Loss 8.6751 LearningRate 0.0710 Epoch: 6 Global Step: 35760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:29,707-Speed 11635.86 samples/sec Loss 8.8626 LearningRate 0.0710 Epoch: 6 Global Step: 35770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:30,624-Speed 11173.76 samples/sec Loss 8.7021 LearningRate 0.0710 Epoch: 6 Global Step: 35780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:31,542-Speed 11159.99 samples/sec Loss 8.7354 LearningRate 0.0710 Epoch: 6 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:32,457-Speed 11202.81 samples/sec Loss 8.7732 LearningRate 0.0710 Epoch: 6 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:33,352-Speed 11447.82 samples/sec Loss 8.6251 LearningRate 0.0710 Epoch: 6 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:34,274-Speed 11114.81 samples/sec Loss 8.7518 LearningRate 0.0710 Epoch: 6 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:35,167-Speed 11478.86 samples/sec Loss 8.8510 LearningRate 0.0710 Epoch: 6 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:36,095-Speed 11038.65 samples/sec Loss 8.7412 LearningRate 0.0710 Epoch: 6 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:37,036-Speed 10888.89 samples/sec Loss 8.8051 LearningRate 0.0710 Epoch: 6 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:37,976-Speed 10903.70 samples/sec Loss 8.8025 LearningRate 0.0710 Epoch: 6 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:38,914-Speed 10927.58 samples/sec Loss 8.8345 LearningRate 0.0709 Epoch: 6 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:39,863-Speed 10797.92 samples/sec Loss 8.7572 LearningRate 0.0709 Epoch: 6 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:40,821-Speed 10694.44 samples/sec Loss 8.7463 LearningRate 0.0709 Epoch: 6 Global Step: 35890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:41,757-Speed 10950.27 samples/sec Loss 8.7721 LearningRate 0.0709 Epoch: 6 Global Step: 35900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:42,677-Speed 11142.03 samples/sec Loss 8.8095 LearningRate 0.0709 Epoch: 6 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:43,569-Speed 11480.67 samples/sec Loss 8.6751 LearningRate 0.0709 Epoch: 6 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:44,477-Speed 11286.87 samples/sec Loss 8.6572 LearningRate 0.0709 Epoch: 6 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:45,407-Speed 11025.26 samples/sec Loss 8.6667 LearningRate 0.0709 Epoch: 6 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:46,320-Speed 11223.66 samples/sec Loss 8.6778 LearningRate 0.0709 Epoch: 6 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:47,264-Speed 10848.81 samples/sec Loss 8.8774 LearningRate 0.0709 Epoch: 6 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:30:48,214-Speed 10782.68 samples/sec Loss 8.5807 LearningRate 0.0709 Epoch: 6 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:49,097-Speed 11613.41 samples/sec Loss 8.8042 LearningRate 0.0709 Epoch: 6 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:50,033-Speed 10947.73 samples/sec Loss 8.7666 LearningRate 0.0709 Epoch: 6 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:30:50,948-Speed 11202.39 samples/sec Loss 8.7002 LearningRate 0.0708 Epoch: 6 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:31:12,142-[lfw][36000]XNorm: 13.867602 Training: 2022-04-11 00:31:12,142-[lfw][36000]Accuracy-Flip: 0.99283+-0.00448 Training: 2022-04-11 00:31:12,142-[lfw][36000]Accuracy-Highest: 0.99417 Training: 2022-04-11 00:31:36,739-[cfp_fp][36000]XNorm: 11.544284 Training: 2022-04-11 00:31:36,740-[cfp_fp][36000]Accuracy-Flip: 0.92414+-0.01223 Training: 2022-04-11 00:31:36,740-[cfp_fp][36000]Accuracy-Highest: 0.92471 Training: 2022-04-11 00:31:57,964-[agedb_30][36000]XNorm: 13.538747 Training: 2022-04-11 00:31:57,965-[agedb_30][36000]Accuracy-Flip: 0.95517+-0.00990 Training: 2022-04-11 00:31:57,965-[agedb_30][36000]Accuracy-Highest: 0.95517 Training: 2022-04-11 00:31:58,910-Speed 150.67 samples/sec Loss 8.8332 LearningRate 0.0708 Epoch: 6 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:31:59,824-Speed 11207.80 samples/sec Loss 8.7199 LearningRate 0.0708 Epoch: 6 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:00,749-Speed 11085.58 samples/sec Loss 8.8420 LearningRate 0.0708 Epoch: 6 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:01,699-Speed 10783.01 samples/sec Loss 8.7136 LearningRate 0.0708 Epoch: 6 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:02,671-Speed 10542.28 samples/sec Loss 8.6390 LearningRate 0.0708 Epoch: 6 Global Step: 36050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:03,559-Speed 11538.38 samples/sec Loss 8.8066 LearningRate 0.0708 Epoch: 6 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:04,458-Speed 11404.14 samples/sec Loss 8.8037 LearningRate 0.0708 Epoch: 6 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:05,374-Speed 11191.77 samples/sec Loss 8.7696 LearningRate 0.0708 Epoch: 6 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:06,300-Speed 11068.35 samples/sec Loss 8.7708 LearningRate 0.0708 Epoch: 6 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:07,217-Speed 11166.24 samples/sec Loss 8.8396 LearningRate 0.0708 Epoch: 6 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:08,135-Speed 11163.16 samples/sec Loss 8.9095 LearningRate 0.0708 Epoch: 6 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:09,022-Speed 11560.50 samples/sec Loss 8.7407 LearningRate 0.0708 Epoch: 6 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:09,942-Speed 11135.21 samples/sec Loss 8.6517 LearningRate 0.0708 Epoch: 6 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:10,852-Speed 11254.99 samples/sec Loss 8.8008 LearningRate 0.0707 Epoch: 6 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:11,767-Speed 11200.80 samples/sec Loss 8.7511 LearningRate 0.0707 Epoch: 6 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:12,711-Speed 10859.91 samples/sec Loss 8.7109 LearningRate 0.0707 Epoch: 6 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:13,645-Speed 10967.79 samples/sec Loss 8.7484 LearningRate 0.0707 Epoch: 6 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:14,583-Speed 10919.82 samples/sec Loss 8.7503 LearningRate 0.0707 Epoch: 6 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:15,541-Speed 10697.28 samples/sec Loss 8.8157 LearningRate 0.0707 Epoch: 6 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:16,446-Speed 11324.09 samples/sec Loss 8.6598 LearningRate 0.0707 Epoch: 6 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:17,373-Speed 11061.57 samples/sec Loss 8.8600 LearningRate 0.0707 Epoch: 6 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:18,321-Speed 10807.46 samples/sec Loss 8.8303 LearningRate 0.0707 Epoch: 6 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:19,239-Speed 11158.38 samples/sec Loss 8.6954 LearningRate 0.0707 Epoch: 6 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:20,159-Speed 11140.34 samples/sec Loss 8.7080 LearningRate 0.0707 Epoch: 6 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:21,036-Speed 11683.56 samples/sec Loss 8.8107 LearningRate 0.0707 Epoch: 6 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:21,960-Speed 11094.10 samples/sec Loss 8.7773 LearningRate 0.0707 Epoch: 6 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:22,874-Speed 11206.26 samples/sec Loss 8.7515 LearningRate 0.0706 Epoch: 6 Global Step: 36270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:23,753-Speed 11663.13 samples/sec Loss 8.8967 LearningRate 0.0706 Epoch: 6 Global Step: 36280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:24,679-Speed 11059.21 samples/sec Loss 8.8517 LearningRate 0.0706 Epoch: 6 Global Step: 36290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:25,606-Speed 11054.99 samples/sec Loss 8.7203 LearningRate 0.0706 Epoch: 6 Global Step: 36300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:26,515-Speed 11273.90 samples/sec Loss 8.6868 LearningRate 0.0706 Epoch: 6 Global Step: 36310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:27,468-Speed 10756.83 samples/sec Loss 8.6527 LearningRate 0.0706 Epoch: 6 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:28,379-Speed 11252.79 samples/sec Loss 8.5617 LearningRate 0.0706 Epoch: 6 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:29,310-Speed 11005.64 samples/sec Loss 8.8003 LearningRate 0.0706 Epoch: 6 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:30,257-Speed 10820.14 samples/sec Loss 8.5849 LearningRate 0.0706 Epoch: 6 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:31,201-Speed 10855.35 samples/sec Loss 8.7762 LearningRate 0.0706 Epoch: 6 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:32,119-Speed 11156.11 samples/sec Loss 8.6211 LearningRate 0.0706 Epoch: 6 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:33,051-Speed 10999.62 samples/sec Loss 8.8031 LearningRate 0.0706 Epoch: 6 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:33,966-Speed 11202.22 samples/sec Loss 8.7766 LearningRate 0.0706 Epoch: 6 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:34,851-Speed 11567.79 samples/sec Loss 8.7382 LearningRate 0.0706 Epoch: 6 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:35,782-Speed 11010.66 samples/sec Loss 8.6969 LearningRate 0.0705 Epoch: 6 Global Step: 36410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:36,715-Speed 10982.96 samples/sec Loss 8.8246 LearningRate 0.0705 Epoch: 6 Global Step: 36420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:37,665-Speed 10783.82 samples/sec Loss 8.7044 LearningRate 0.0705 Epoch: 6 Global Step: 36430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:38,592-Speed 11065.04 samples/sec Loss 8.7330 LearningRate 0.0705 Epoch: 6 Global Step: 36440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:39,521-Speed 11021.78 samples/sec Loss 8.7552 LearningRate 0.0705 Epoch: 6 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:40,477-Speed 10724.17 samples/sec Loss 8.7765 LearningRate 0.0705 Epoch: 6 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:41,404-Speed 11052.81 samples/sec Loss 8.6795 LearningRate 0.0705 Epoch: 6 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:42,367-Speed 10647.11 samples/sec Loss 8.7686 LearningRate 0.0705 Epoch: 6 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:43,314-Speed 10811.18 samples/sec Loss 8.6182 LearningRate 0.0705 Epoch: 6 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:44,229-Speed 11200.47 samples/sec Loss 8.6454 LearningRate 0.0705 Epoch: 6 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:45,101-Speed 11752.37 samples/sec Loss 8.6044 LearningRate 0.0705 Epoch: 6 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:45,986-Speed 11587.51 samples/sec Loss 8.7396 LearningRate 0.0705 Epoch: 6 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:46,883-Speed 11414.23 samples/sec Loss 8.6239 LearningRate 0.0705 Epoch: 6 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:47,762-Speed 11658.20 samples/sec Loss 8.6942 LearningRate 0.0704 Epoch: 6 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:48,685-Speed 11110.56 samples/sec Loss 8.8346 LearningRate 0.0704 Epoch: 6 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:49,635-Speed 10779.63 samples/sec Loss 8.8480 LearningRate 0.0704 Epoch: 6 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:50,548-Speed 11222.08 samples/sec Loss 8.7170 LearningRate 0.0704 Epoch: 6 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:51,461-Speed 11237.28 samples/sec Loss 8.7260 LearningRate 0.0704 Epoch: 6 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:52,355-Speed 11462.99 samples/sec Loss 8.6921 LearningRate 0.0704 Epoch: 6 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:32:53,255-Speed 11392.76 samples/sec Loss 8.5254 LearningRate 0.0704 Epoch: 6 Global Step: 36600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:54,220-Speed 10617.61 samples/sec Loss 8.7280 LearningRate 0.0704 Epoch: 6 Global Step: 36610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:55,124-Speed 11330.42 samples/sec Loss 8.6014 LearningRate 0.0704 Epoch: 6 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:56,013-Speed 11530.87 samples/sec Loss 8.8855 LearningRate 0.0704 Epoch: 6 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:56,944-Speed 10996.49 samples/sec Loss 8.8157 LearningRate 0.0704 Epoch: 6 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:57,853-Speed 11276.98 samples/sec Loss 8.7955 LearningRate 0.0704 Epoch: 6 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:58,761-Speed 11286.00 samples/sec Loss 8.7713 LearningRate 0.0704 Epoch: 6 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:32:59,677-Speed 11191.83 samples/sec Loss 8.8030 LearningRate 0.0704 Epoch: 6 Global Step: 36670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:00,577-Speed 11383.13 samples/sec Loss 8.7850 LearningRate 0.0703 Epoch: 6 Global Step: 36680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:01,553-Speed 10500.48 samples/sec Loss 8.7128 LearningRate 0.0703 Epoch: 6 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:02,475-Speed 11115.68 samples/sec Loss 8.6006 LearningRate 0.0703 Epoch: 6 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:03,382-Speed 11301.25 samples/sec Loss 8.7782 LearningRate 0.0703 Epoch: 6 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:04,280-Speed 11403.36 samples/sec Loss 8.7826 LearningRate 0.0703 Epoch: 6 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:05,204-Speed 11090.22 samples/sec Loss 8.9163 LearningRate 0.0703 Epoch: 6 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:06,099-Speed 11459.75 samples/sec Loss 8.5954 LearningRate 0.0703 Epoch: 6 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:07,047-Speed 10802.78 samples/sec Loss 8.6089 LearningRate 0.0703 Epoch: 6 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:07,931-Speed 11596.01 samples/sec Loss 8.6520 LearningRate 0.0703 Epoch: 6 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:08,837-Speed 11313.95 samples/sec Loss 8.7948 LearningRate 0.0703 Epoch: 6 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:09,741-Speed 11332.85 samples/sec Loss 8.8584 LearningRate 0.0703 Epoch: 6 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:10,637-Speed 11429.48 samples/sec Loss 8.6917 LearningRate 0.0703 Epoch: 6 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:11,537-Speed 11387.29 samples/sec Loss 8.7025 LearningRate 0.0703 Epoch: 6 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:12,428-Speed 11502.43 samples/sec Loss 8.6957 LearningRate 0.0703 Epoch: 6 Global Step: 36810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:13,336-Speed 11283.73 samples/sec Loss 8.9000 LearningRate 0.0702 Epoch: 6 Global Step: 36820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:14,251-Speed 11197.24 samples/sec Loss 8.6939 LearningRate 0.0702 Epoch: 6 Global Step: 36830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:15,167-Speed 11193.86 samples/sec Loss 8.6153 LearningRate 0.0702 Epoch: 6 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:16,081-Speed 11212.54 samples/sec Loss 8.8076 LearningRate 0.0702 Epoch: 6 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:16,987-Speed 11302.95 samples/sec Loss 8.7101 LearningRate 0.0702 Epoch: 6 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:17,886-Speed 11396.41 samples/sec Loss 8.8023 LearningRate 0.0702 Epoch: 6 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:18,799-Speed 11224.77 samples/sec Loss 8.9449 LearningRate 0.0702 Epoch: 6 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:19,705-Speed 11314.65 samples/sec Loss 8.7768 LearningRate 0.0702 Epoch: 6 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:20,595-Speed 11517.10 samples/sec Loss 8.7524 LearningRate 0.0702 Epoch: 6 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:21,504-Speed 11274.62 samples/sec Loss 8.8963 LearningRate 0.0702 Epoch: 6 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:22,407-Speed 11346.13 samples/sec Loss 8.6945 LearningRate 0.0702 Epoch: 6 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:23,309-Speed 11362.80 samples/sec Loss 8.7501 LearningRate 0.0702 Epoch: 6 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:24,211-Speed 11357.24 samples/sec Loss 8.8702 LearningRate 0.0702 Epoch: 6 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:25,149-Speed 10924.90 samples/sec Loss 8.8820 LearningRate 0.0701 Epoch: 6 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:26,033-Speed 11595.61 samples/sec Loss 8.8732 LearningRate 0.0701 Epoch: 6 Global Step: 36960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:26,982-Speed 10797.55 samples/sec Loss 8.7129 LearningRate 0.0701 Epoch: 6 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:27,913-Speed 11007.98 samples/sec Loss 8.7166 LearningRate 0.0701 Epoch: 6 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:28,857-Speed 10856.67 samples/sec Loss 8.7521 LearningRate 0.0701 Epoch: 6 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:29,744-Speed 11542.91 samples/sec Loss 8.7344 LearningRate 0.0701 Epoch: 6 Global Step: 37000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:30,638-Speed 11466.87 samples/sec Loss 8.7223 LearningRate 0.0701 Epoch: 6 Global Step: 37010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:31,571-Speed 10984.47 samples/sec Loss 8.7803 LearningRate 0.0701 Epoch: 6 Global Step: 37020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:32,487-Speed 11180.20 samples/sec Loss 8.6957 LearningRate 0.0701 Epoch: 6 Global Step: 37030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:33,416-Speed 11028.41 samples/sec Loss 8.6559 LearningRate 0.0701 Epoch: 6 Global Step: 37040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:34,317-Speed 11382.02 samples/sec Loss 8.8993 LearningRate 0.0701 Epoch: 6 Global Step: 37050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:35,227-Speed 11253.74 samples/sec Loss 8.6701 LearningRate 0.0701 Epoch: 6 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:36,130-Speed 11349.47 samples/sec Loss 8.5979 LearningRate 0.0701 Epoch: 6 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:37,043-Speed 11217.98 samples/sec Loss 8.7221 LearningRate 0.0701 Epoch: 6 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:37,946-Speed 11348.85 samples/sec Loss 8.6977 LearningRate 0.0700 Epoch: 6 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:38,882-Speed 10957.29 samples/sec Loss 8.7656 LearningRate 0.0700 Epoch: 6 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:39,839-Speed 10714.77 samples/sec Loss 8.7550 LearningRate 0.0700 Epoch: 6 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:40,745-Speed 11306.82 samples/sec Loss 8.7380 LearningRate 0.0700 Epoch: 6 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:41,637-Speed 11489.38 samples/sec Loss 8.7711 LearningRate 0.0700 Epoch: 6 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:42,544-Speed 11300.23 samples/sec Loss 8.7797 LearningRate 0.0700 Epoch: 6 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:43,457-Speed 11219.87 samples/sec Loss 8.6378 LearningRate 0.0700 Epoch: 6 Global Step: 37150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:44,361-Speed 11343.59 samples/sec Loss 8.4626 LearningRate 0.0700 Epoch: 6 Global Step: 37160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:45,303-Speed 10872.65 samples/sec Loss 8.7181 LearningRate 0.0700 Epoch: 6 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:46,199-Speed 11433.07 samples/sec Loss 8.7785 LearningRate 0.0700 Epoch: 6 Global Step: 37180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:47,152-Speed 10753.67 samples/sec Loss 8.6947 LearningRate 0.0700 Epoch: 6 Global Step: 37190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:48,095-Speed 10868.49 samples/sec Loss 8.7645 LearningRate 0.0700 Epoch: 6 Global Step: 37200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:48,996-Speed 11372.66 samples/sec Loss 8.8369 LearningRate 0.0700 Epoch: 6 Global Step: 37210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:49,953-Speed 10711.31 samples/sec Loss 8.6417 LearningRate 0.0699 Epoch: 6 Global Step: 37220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:33:50,848-Speed 11450.75 samples/sec Loss 8.7684 LearningRate 0.0699 Epoch: 6 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:51,774-Speed 11061.22 samples/sec Loss 8.6995 LearningRate 0.0699 Epoch: 6 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:52,742-Speed 10590.19 samples/sec Loss 8.5423 LearningRate 0.0699 Epoch: 6 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:53,673-Speed 10996.33 samples/sec Loss 8.8055 LearningRate 0.0699 Epoch: 6 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:54,619-Speed 10842.13 samples/sec Loss 8.7754 LearningRate 0.0699 Epoch: 6 Global Step: 37270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:55,571-Speed 10758.48 samples/sec Loss 8.6009 LearningRate 0.0699 Epoch: 6 Global Step: 37280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:56,474-Speed 11347.87 samples/sec Loss 8.7371 LearningRate 0.0699 Epoch: 6 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:57,386-Speed 11230.42 samples/sec Loss 8.5835 LearningRate 0.0699 Epoch: 6 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:58,289-Speed 11349.69 samples/sec Loss 8.6932 LearningRate 0.0699 Epoch: 6 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:33:59,248-Speed 10692.99 samples/sec Loss 8.6068 LearningRate 0.0699 Epoch: 6 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:00,144-Speed 11439.90 samples/sec Loss 8.7943 LearningRate 0.0699 Epoch: 6 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:01,066-Speed 11106.02 samples/sec Loss 8.5821 LearningRate 0.0699 Epoch: 6 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:01,979-Speed 11228.28 samples/sec Loss 8.7667 LearningRate 0.0699 Epoch: 6 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:02,897-Speed 11158.81 samples/sec Loss 8.7271 LearningRate 0.0698 Epoch: 6 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:03,791-Speed 11467.82 samples/sec Loss 8.8375 LearningRate 0.0698 Epoch: 6 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:04,714-Speed 11098.59 samples/sec Loss 8.7604 LearningRate 0.0698 Epoch: 6 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:05,619-Speed 11318.05 samples/sec Loss 8.8471 LearningRate 0.0698 Epoch: 6 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:06,505-Speed 11567.43 samples/sec Loss 8.8195 LearningRate 0.0698 Epoch: 6 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:07,387-Speed 11621.78 samples/sec Loss 8.6854 LearningRate 0.0698 Epoch: 6 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:08,298-Speed 11246.36 samples/sec Loss 8.5436 LearningRate 0.0698 Epoch: 6 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:09,193-Speed 11455.14 samples/sec Loss 8.7537 LearningRate 0.0698 Epoch: 6 Global Step: 37430 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:34:10,100-Speed 11291.84 samples/sec Loss 8.6225 LearningRate 0.0698 Epoch: 6 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:10,988-Speed 11551.24 samples/sec Loss 8.8729 LearningRate 0.0698 Epoch: 6 Global Step: 37450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:11,908-Speed 11131.61 samples/sec Loss 8.7734 LearningRate 0.0698 Epoch: 6 Global Step: 37460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:12,836-Speed 11038.51 samples/sec Loss 8.8852 LearningRate 0.0698 Epoch: 6 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:13,772-Speed 10948.31 samples/sec Loss 8.7525 LearningRate 0.0698 Epoch: 6 Global Step: 37480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:14,674-Speed 11362.15 samples/sec Loss 8.7630 LearningRate 0.0698 Epoch: 6 Global Step: 37490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:15,609-Speed 10961.07 samples/sec Loss 8.5885 LearningRate 0.0697 Epoch: 6 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:16,570-Speed 10668.40 samples/sec Loss 8.7389 LearningRate 0.0697 Epoch: 6 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:17,530-Speed 10667.87 samples/sec Loss 8.5862 LearningRate 0.0697 Epoch: 6 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:18,459-Speed 11034.01 samples/sec Loss 8.6031 LearningRate 0.0697 Epoch: 6 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:19,373-Speed 11212.33 samples/sec Loss 8.8296 LearningRate 0.0697 Epoch: 6 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:20,267-Speed 11451.77 samples/sec Loss 8.6717 LearningRate 0.0697 Epoch: 6 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:21,138-Speed 11771.86 samples/sec Loss 8.8317 LearningRate 0.0697 Epoch: 6 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:22,046-Speed 11295.35 samples/sec Loss 8.6261 LearningRate 0.0697 Epoch: 6 Global Step: 37570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:22,956-Speed 11258.91 samples/sec Loss 8.6030 LearningRate 0.0697 Epoch: 6 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:23,854-Speed 11410.81 samples/sec Loss 8.8722 LearningRate 0.0697 Epoch: 6 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:24,775-Speed 11119.75 samples/sec Loss 8.7236 LearningRate 0.0697 Epoch: 6 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:25,741-Speed 10615.71 samples/sec Loss 8.8173 LearningRate 0.0697 Epoch: 6 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:26,654-Speed 11220.58 samples/sec Loss 8.7114 LearningRate 0.0697 Epoch: 6 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:27,561-Speed 11296.78 samples/sec Loss 8.8078 LearningRate 0.0696 Epoch: 6 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:28,475-Speed 11212.48 samples/sec Loss 8.7406 LearningRate 0.0696 Epoch: 6 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:29,453-Speed 10468.57 samples/sec Loss 8.6968 LearningRate 0.0696 Epoch: 6 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:30,394-Speed 10899.84 samples/sec Loss 8.6259 LearningRate 0.0696 Epoch: 6 Global Step: 37660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:31,333-Speed 10909.33 samples/sec Loss 8.7932 LearningRate 0.0696 Epoch: 6 Global Step: 37670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:32,274-Speed 10891.88 samples/sec Loss 8.6864 LearningRate 0.0696 Epoch: 6 Global Step: 37680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:33,147-Speed 11739.81 samples/sec Loss 8.6935 LearningRate 0.0696 Epoch: 6 Global Step: 37690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:34,028-Speed 11628.28 samples/sec Loss 8.7735 LearningRate 0.0696 Epoch: 6 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:34,924-Speed 11435.25 samples/sec Loss 8.7243 LearningRate 0.0696 Epoch: 6 Global Step: 37710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:35,894-Speed 10568.17 samples/sec Loss 8.8819 LearningRate 0.0696 Epoch: 6 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:36,809-Speed 11188.39 samples/sec Loss 8.8209 LearningRate 0.0696 Epoch: 6 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:37,728-Speed 11151.04 samples/sec Loss 8.6402 LearningRate 0.0696 Epoch: 6 Global Step: 37740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:38,617-Speed 11530.00 samples/sec Loss 8.8732 LearningRate 0.0696 Epoch: 6 Global Step: 37750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:39,572-Speed 10726.48 samples/sec Loss 8.7891 LearningRate 0.0696 Epoch: 6 Global Step: 37760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:40,538-Speed 10611.74 samples/sec Loss 8.8518 LearningRate 0.0695 Epoch: 6 Global Step: 37770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:41,491-Speed 10749.81 samples/sec Loss 8.7862 LearningRate 0.0695 Epoch: 6 Global Step: 37780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:42,465-Speed 10528.58 samples/sec Loss 8.6106 LearningRate 0.0695 Epoch: 6 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:43,391-Speed 11058.97 samples/sec Loss 8.6422 LearningRate 0.0695 Epoch: 6 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:44,261-Speed 11787.49 samples/sec Loss 8.6680 LearningRate 0.0695 Epoch: 6 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:45,178-Speed 11169.03 samples/sec Loss 8.8672 LearningRate 0.0695 Epoch: 6 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:46,077-Speed 11398.77 samples/sec Loss 8.8349 LearningRate 0.0695 Epoch: 6 Global Step: 37830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:46,977-Speed 11390.63 samples/sec Loss 8.7866 LearningRate 0.0695 Epoch: 6 Global Step: 37840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:47,883-Speed 11310.27 samples/sec Loss 8.8447 LearningRate 0.0695 Epoch: 6 Global Step: 37850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:48,829-Speed 10824.22 samples/sec Loss 8.8968 LearningRate 0.0695 Epoch: 6 Global Step: 37860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:49,755-Speed 11073.99 samples/sec Loss 8.6188 LearningRate 0.0695 Epoch: 6 Global Step: 37870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:50,656-Speed 11372.91 samples/sec Loss 8.7444 LearningRate 0.0695 Epoch: 6 Global Step: 37880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:34:51,543-Speed 11553.11 samples/sec Loss 8.7196 LearningRate 0.0695 Epoch: 6 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:52,431-Speed 11542.73 samples/sec Loss 8.7717 LearningRate 0.0694 Epoch: 6 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:53,320-Speed 11517.45 samples/sec Loss 8.8523 LearningRate 0.0694 Epoch: 6 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:54,210-Speed 11516.27 samples/sec Loss 8.6336 LearningRate 0.0694 Epoch: 6 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:55,096-Speed 11564.69 samples/sec Loss 8.8201 LearningRate 0.0694 Epoch: 6 Global Step: 37930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:56,011-Speed 11194.01 samples/sec Loss 8.7441 LearningRate 0.0694 Epoch: 6 Global Step: 37940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:56,932-Speed 11131.61 samples/sec Loss 8.6801 LearningRate 0.0694 Epoch: 6 Global Step: 37950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:57,821-Speed 11526.29 samples/sec Loss 8.6659 LearningRate 0.0694 Epoch: 6 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:58,744-Speed 11097.98 samples/sec Loss 8.7279 LearningRate 0.0694 Epoch: 6 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:34:59,636-Speed 11486.15 samples/sec Loss 8.6547 LearningRate 0.0694 Epoch: 6 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:35:00,533-Speed 11424.69 samples/sec Loss 8.7668 LearningRate 0.0694 Epoch: 6 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:35:01,477-Speed 10855.71 samples/sec Loss 8.8510 LearningRate 0.0694 Epoch: 6 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:35:22,987-[lfw][38000]XNorm: 13.921568 Training: 2022-04-11 00:35:22,988-[lfw][38000]Accuracy-Flip: 0.99433+-0.00351 Training: 2022-04-11 00:35:22,988-[lfw][38000]Accuracy-Highest: 0.99433 Training: 2022-04-11 00:35:47,753-[cfp_fp][38000]XNorm: 11.670803 Training: 2022-04-11 00:35:47,759-[cfp_fp][38000]Accuracy-Flip: 0.92700+-0.01375 Training: 2022-04-11 00:35:47,759-[cfp_fp][38000]Accuracy-Highest: 0.92700 Training: 2022-04-11 00:36:09,009-[agedb_30][38000]XNorm: 13.688974 Training: 2022-04-11 00:36:09,010-[agedb_30][38000]Accuracy-Flip: 0.94967+-0.00980 Training: 2022-04-11 00:36:09,010-[agedb_30][38000]Accuracy-Highest: 0.95517 Training: 2022-04-11 00:36:09,899-Speed 149.66 samples/sec Loss 8.7741 LearningRate 0.0694 Epoch: 6 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:10,806-Speed 11294.82 samples/sec Loss 8.7963 LearningRate 0.0694 Epoch: 6 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:11,720-Speed 11223.79 samples/sec Loss 8.7621 LearningRate 0.0694 Epoch: 6 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:12,638-Speed 11161.03 samples/sec Loss 8.7783 LearningRate 0.0693 Epoch: 6 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:13,584-Speed 10836.60 samples/sec Loss 8.5995 LearningRate 0.0693 Epoch: 6 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:14,522-Speed 10927.10 samples/sec Loss 8.7135 LearningRate 0.0693 Epoch: 6 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:15,423-Speed 11376.21 samples/sec Loss 8.7123 LearningRate 0.0693 Epoch: 6 Global Step: 38070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:16,352-Speed 11031.18 samples/sec Loss 8.8233 LearningRate 0.0693 Epoch: 6 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:17,298-Speed 10834.47 samples/sec Loss 8.7574 LearningRate 0.0693 Epoch: 6 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:18,232-Speed 10973.51 samples/sec Loss 8.6529 LearningRate 0.0693 Epoch: 6 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:19,137-Speed 11325.30 samples/sec Loss 8.6619 LearningRate 0.0693 Epoch: 6 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:20,051-Speed 11200.22 samples/sec Loss 8.7197 LearningRate 0.0693 Epoch: 6 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:20,952-Speed 11374.92 samples/sec Loss 8.7491 LearningRate 0.0693 Epoch: 6 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:21,877-Speed 11084.15 samples/sec Loss 8.7243 LearningRate 0.0693 Epoch: 6 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:22,769-Speed 11486.50 samples/sec Loss 8.7408 LearningRate 0.0693 Epoch: 6 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:23,661-Speed 11490.67 samples/sec Loss 8.7280 LearningRate 0.0693 Epoch: 6 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:24,549-Speed 11538.59 samples/sec Loss 8.7280 LearningRate 0.0693 Epoch: 6 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:25,495-Speed 10831.03 samples/sec Loss 8.7699 LearningRate 0.0692 Epoch: 6 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:26,411-Speed 11190.73 samples/sec Loss 8.6932 LearningRate 0.0692 Epoch: 6 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:27,316-Speed 11321.74 samples/sec Loss 8.7815 LearningRate 0.0692 Epoch: 6 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:28,265-Speed 10805.45 samples/sec Loss 8.7780 LearningRate 0.0692 Epoch: 6 Global Step: 38210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:29,155-Speed 11515.65 samples/sec Loss 8.8306 LearningRate 0.0692 Epoch: 6 Global Step: 38220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:30,046-Speed 11493.63 samples/sec Loss 8.5945 LearningRate 0.0692 Epoch: 6 Global Step: 38230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:30,968-Speed 11113.74 samples/sec Loss 8.5670 LearningRate 0.0692 Epoch: 6 Global Step: 38240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:31,908-Speed 10902.11 samples/sec Loss 8.8358 LearningRate 0.0692 Epoch: 6 Global Step: 38250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:32,814-Speed 11310.52 samples/sec Loss 8.6240 LearningRate 0.0692 Epoch: 6 Global Step: 38260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:33,720-Speed 11320.21 samples/sec Loss 8.7106 LearningRate 0.0692 Epoch: 6 Global Step: 38270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:34,663-Speed 10867.00 samples/sec Loss 8.6334 LearningRate 0.0692 Epoch: 6 Global Step: 38280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:35,596-Speed 10975.09 samples/sec Loss 8.8001 LearningRate 0.0692 Epoch: 6 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:36,532-Speed 10950.29 samples/sec Loss 8.6857 LearningRate 0.0692 Epoch: 6 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:37,424-Speed 11486.30 samples/sec Loss 8.5516 LearningRate 0.0691 Epoch: 6 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:38,362-Speed 10927.57 samples/sec Loss 8.4933 LearningRate 0.0691 Epoch: 6 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:39,318-Speed 10719.58 samples/sec Loss 8.5772 LearningRate 0.0691 Epoch: 6 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:40,246-Speed 11040.49 samples/sec Loss 8.6271 LearningRate 0.0691 Epoch: 6 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:41,197-Speed 10784.52 samples/sec Loss 8.6787 LearningRate 0.0691 Epoch: 6 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:42,144-Speed 10816.01 samples/sec Loss 8.6894 LearningRate 0.0691 Epoch: 6 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:43,044-Speed 11391.79 samples/sec Loss 8.6475 LearningRate 0.0691 Epoch: 6 Global Step: 38370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:43,948-Speed 11330.04 samples/sec Loss 8.7509 LearningRate 0.0691 Epoch: 6 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:44,882-Speed 10971.69 samples/sec Loss 8.6253 LearningRate 0.0691 Epoch: 6 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:45,781-Speed 11397.44 samples/sec Loss 8.5684 LearningRate 0.0691 Epoch: 6 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:46,702-Speed 11137.04 samples/sec Loss 8.5935 LearningRate 0.0691 Epoch: 6 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:47,607-Speed 11313.38 samples/sec Loss 8.7197 LearningRate 0.0691 Epoch: 6 Global Step: 38420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:48,493-Speed 11586.13 samples/sec Loss 8.5743 LearningRate 0.0691 Epoch: 6 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:36:49,408-Speed 11201.95 samples/sec Loss 8.6935 LearningRate 0.0691 Epoch: 6 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:50,338-Speed 11018.35 samples/sec Loss 8.6922 LearningRate 0.0690 Epoch: 6 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:51,249-Speed 11238.84 samples/sec Loss 8.8003 LearningRate 0.0690 Epoch: 6 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:52,148-Speed 11405.75 samples/sec Loss 8.6496 LearningRate 0.0690 Epoch: 6 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:53,045-Speed 11421.94 samples/sec Loss 8.7540 LearningRate 0.0690 Epoch: 6 Global Step: 38480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:53,920-Speed 11714.14 samples/sec Loss 8.6780 LearningRate 0.0690 Epoch: 6 Global Step: 38490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:54,807-Speed 11546.44 samples/sec Loss 8.6233 LearningRate 0.0690 Epoch: 6 Global Step: 38500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:55,694-Speed 11552.22 samples/sec Loss 8.7704 LearningRate 0.0690 Epoch: 6 Global Step: 38510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:56,612-Speed 11162.85 samples/sec Loss 8.6788 LearningRate 0.0690 Epoch: 6 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:57,516-Speed 11340.84 samples/sec Loss 8.6499 LearningRate 0.0690 Epoch: 6 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:58,398-Speed 11618.07 samples/sec Loss 8.7282 LearningRate 0.0690 Epoch: 6 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:36:59,346-Speed 10812.81 samples/sec Loss 8.5982 LearningRate 0.0690 Epoch: 6 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:00,276-Speed 11016.46 samples/sec Loss 8.6376 LearningRate 0.0690 Epoch: 6 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:01,170-Speed 11462.47 samples/sec Loss 8.5741 LearningRate 0.0690 Epoch: 6 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:02,070-Speed 11392.45 samples/sec Loss 8.9359 LearningRate 0.0690 Epoch: 6 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:02,980-Speed 11251.81 samples/sec Loss 8.5912 LearningRate 0.0689 Epoch: 6 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:03,923-Speed 10874.91 samples/sec Loss 8.6409 LearningRate 0.0689 Epoch: 6 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:04,848-Speed 11077.07 samples/sec Loss 8.6904 LearningRate 0.0689 Epoch: 6 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:05,746-Speed 11414.22 samples/sec Loss 8.7375 LearningRate 0.0689 Epoch: 6 Global Step: 38620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:06,658-Speed 11227.96 samples/sec Loss 8.5937 LearningRate 0.0689 Epoch: 6 Global Step: 38630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:07,632-Speed 10523.09 samples/sec Loss 8.6642 LearningRate 0.0689 Epoch: 6 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:08,541-Speed 11269.54 samples/sec Loss 8.7127 LearningRate 0.0689 Epoch: 6 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:09,424-Speed 11617.13 samples/sec Loss 8.5465 LearningRate 0.0689 Epoch: 6 Global Step: 38660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:10,310-Speed 11562.08 samples/sec Loss 8.6774 LearningRate 0.0689 Epoch: 6 Global Step: 38670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:11,227-Speed 11173.50 samples/sec Loss 8.6015 LearningRate 0.0689 Epoch: 6 Global Step: 38680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:12,126-Speed 11391.24 samples/sec Loss 8.7506 LearningRate 0.0689 Epoch: 6 Global Step: 38690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:13,022-Speed 11436.98 samples/sec Loss 8.5916 LearningRate 0.0689 Epoch: 6 Global Step: 38700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:13,982-Speed 10683.51 samples/sec Loss 8.6614 LearningRate 0.0689 Epoch: 6 Global Step: 38710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:14,898-Speed 11192.97 samples/sec Loss 8.6055 LearningRate 0.0688 Epoch: 6 Global Step: 38720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:15,849-Speed 10773.43 samples/sec Loss 8.6743 LearningRate 0.0688 Epoch: 6 Global Step: 38730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:16,803-Speed 10736.21 samples/sec Loss 8.7715 LearningRate 0.0688 Epoch: 6 Global Step: 38740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:17,712-Speed 11278.05 samples/sec Loss 8.6000 LearningRate 0.0688 Epoch: 6 Global Step: 38750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:18,638-Speed 11063.07 samples/sec Loss 8.5116 LearningRate 0.0688 Epoch: 6 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:19,570-Speed 10992.82 samples/sec Loss 8.6935 LearningRate 0.0688 Epoch: 6 Global Step: 38770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:20,495-Speed 11073.26 samples/sec Loss 8.8329 LearningRate 0.0688 Epoch: 6 Global Step: 38780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:21,425-Speed 11019.93 samples/sec Loss 8.7847 LearningRate 0.0688 Epoch: 6 Global Step: 38790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:22,335-Speed 11260.42 samples/sec Loss 8.7362 LearningRate 0.0688 Epoch: 6 Global Step: 38800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:23,246-Speed 11260.43 samples/sec Loss 8.7652 LearningRate 0.0688 Epoch: 6 Global Step: 38810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:24,187-Speed 10881.80 samples/sec Loss 8.7038 LearningRate 0.0688 Epoch: 6 Global Step: 38820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:25,151-Speed 10629.02 samples/sec Loss 8.6791 LearningRate 0.0688 Epoch: 6 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:26,058-Speed 11303.17 samples/sec Loss 8.6345 LearningRate 0.0688 Epoch: 6 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:27,015-Speed 10711.69 samples/sec Loss 8.5551 LearningRate 0.0688 Epoch: 6 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:27,979-Speed 10628.98 samples/sec Loss 8.7214 LearningRate 0.0687 Epoch: 6 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:28,948-Speed 10566.60 samples/sec Loss 8.6145 LearningRate 0.0687 Epoch: 6 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:29,856-Speed 11291.29 samples/sec Loss 8.6700 LearningRate 0.0687 Epoch: 6 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:30,770-Speed 11217.31 samples/sec Loss 8.7411 LearningRate 0.0687 Epoch: 6 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:31,687-Speed 11178.29 samples/sec Loss 8.7215 LearningRate 0.0687 Epoch: 6 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:32,601-Speed 11213.46 samples/sec Loss 8.4973 LearningRate 0.0687 Epoch: 6 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:33,504-Speed 11346.42 samples/sec Loss 8.5562 LearningRate 0.0687 Epoch: 6 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:34,385-Speed 11635.03 samples/sec Loss 8.6903 LearningRate 0.0687 Epoch: 6 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:35,306-Speed 11123.37 samples/sec Loss 8.5516 LearningRate 0.0687 Epoch: 6 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:36,214-Speed 11288.21 samples/sec Loss 8.5078 LearningRate 0.0687 Epoch: 6 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:37,128-Speed 11201.31 samples/sec Loss 8.8202 LearningRate 0.0687 Epoch: 6 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:38,058-Speed 11019.43 samples/sec Loss 8.7397 LearningRate 0.0687 Epoch: 6 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:39,000-Speed 10887.00 samples/sec Loss 8.5170 LearningRate 0.0687 Epoch: 6 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:39,895-Speed 11438.14 samples/sec Loss 8.6486 LearningRate 0.0687 Epoch: 6 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:40,830-Speed 10969.01 samples/sec Loss 8.5743 LearningRate 0.0686 Epoch: 6 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:41,776-Speed 10821.30 samples/sec Loss 8.6217 LearningRate 0.0686 Epoch: 6 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:42,731-Speed 10733.77 samples/sec Loss 8.5410 LearningRate 0.0686 Epoch: 6 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:43,646-Speed 11205.06 samples/sec Loss 8.6811 LearningRate 0.0686 Epoch: 6 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:44,587-Speed 10888.55 samples/sec Loss 8.5967 LearningRate 0.0686 Epoch: 6 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:45,527-Speed 10897.86 samples/sec Loss 8.6720 LearningRate 0.0686 Epoch: 6 Global Step: 39050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:46,417-Speed 11513.78 samples/sec Loss 8.6088 LearningRate 0.0686 Epoch: 6 Global Step: 39060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:47,322-Speed 11324.00 samples/sec Loss 8.5555 LearningRate 0.0686 Epoch: 6 Global Step: 39070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:48,218-Speed 11430.98 samples/sec Loss 8.7198 LearningRate 0.0686 Epoch: 6 Global Step: 39080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:49,130-Speed 11236.57 samples/sec Loss 8.7562 LearningRate 0.0686 Epoch: 6 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:50,065-Speed 10959.75 samples/sec Loss 8.7439 LearningRate 0.0686 Epoch: 6 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:51,019-Speed 10748.91 samples/sec Loss 8.5993 LearningRate 0.0686 Epoch: 6 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:51,925-Speed 11308.08 samples/sec Loss 8.9026 LearningRate 0.0686 Epoch: 6 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:52,877-Speed 10764.60 samples/sec Loss 8.6369 LearningRate 0.0686 Epoch: 6 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:53,783-Speed 11308.87 samples/sec Loss 8.6524 LearningRate 0.0685 Epoch: 6 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:54,728-Speed 10848.79 samples/sec Loss 8.5804 LearningRate 0.0685 Epoch: 6 Global Step: 39150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:37:55,654-Speed 11078.21 samples/sec Loss 8.7250 LearningRate 0.0685 Epoch: 6 Global Step: 39160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:56,568-Speed 11210.15 samples/sec Loss 8.6211 LearningRate 0.0685 Epoch: 6 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:57,521-Speed 10756.28 samples/sec Loss 8.6018 LearningRate 0.0685 Epoch: 6 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:58,457-Speed 10951.31 samples/sec Loss 8.6515 LearningRate 0.0685 Epoch: 6 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:37:59,350-Speed 11463.53 samples/sec Loss 8.7645 LearningRate 0.0685 Epoch: 6 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:00,247-Speed 11431.11 samples/sec Loss 8.5578 LearningRate 0.0685 Epoch: 6 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:01,140-Speed 11467.67 samples/sec Loss 8.5994 LearningRate 0.0685 Epoch: 6 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:02,049-Speed 11280.66 samples/sec Loss 8.7828 LearningRate 0.0685 Epoch: 6 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:02,943-Speed 11463.00 samples/sec Loss 8.7021 LearningRate 0.0685 Epoch: 6 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:03,864-Speed 11122.16 samples/sec Loss 8.6351 LearningRate 0.0685 Epoch: 6 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:04,773-Speed 11272.06 samples/sec Loss 8.6443 LearningRate 0.0685 Epoch: 6 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:05,708-Speed 10953.91 samples/sec Loss 8.6060 LearningRate 0.0684 Epoch: 6 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:06,673-Speed 10621.59 samples/sec Loss 8.6431 LearningRate 0.0684 Epoch: 6 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:07,593-Speed 11139.98 samples/sec Loss 8.6916 LearningRate 0.0684 Epoch: 6 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:08,520-Speed 11051.26 samples/sec Loss 8.6106 LearningRate 0.0684 Epoch: 6 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:09,466-Speed 10828.55 samples/sec Loss 8.5666 LearningRate 0.0684 Epoch: 6 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:10,417-Speed 10780.36 samples/sec Loss 8.6580 LearningRate 0.0684 Epoch: 6 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:11,351-Speed 10963.78 samples/sec Loss 8.7006 LearningRate 0.0684 Epoch: 6 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:12,308-Speed 10718.47 samples/sec Loss 8.5533 LearningRate 0.0684 Epoch: 6 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:13,202-Speed 11464.42 samples/sec Loss 8.6904 LearningRate 0.0684 Epoch: 6 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:14,090-Speed 11544.20 samples/sec Loss 8.7728 LearningRate 0.0684 Epoch: 6 Global Step: 39360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:38:15,021-Speed 11004.83 samples/sec Loss 8.5927 LearningRate 0.0684 Epoch: 6 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:15,963-Speed 10881.84 samples/sec Loss 8.5576 LearningRate 0.0684 Epoch: 6 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:16,916-Speed 10756.35 samples/sec Loss 8.7479 LearningRate 0.0684 Epoch: 6 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:17,815-Speed 11396.07 samples/sec Loss 8.6570 LearningRate 0.0684 Epoch: 6 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:18,737-Speed 11120.81 samples/sec Loss 8.6296 LearningRate 0.0683 Epoch: 6 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:19,681-Speed 10853.87 samples/sec Loss 8.6165 LearningRate 0.0683 Epoch: 6 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:20,601-Speed 11135.61 samples/sec Loss 8.6058 LearningRate 0.0683 Epoch: 6 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:21,524-Speed 11099.12 samples/sec Loss 8.5079 LearningRate 0.0683 Epoch: 6 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:22,483-Speed 10692.74 samples/sec Loss 8.5176 LearningRate 0.0683 Epoch: 6 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:23,423-Speed 10890.47 samples/sec Loss 8.6630 LearningRate 0.0683 Epoch: 6 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:24,352-Speed 11031.45 samples/sec Loss 8.5932 LearningRate 0.0683 Epoch: 6 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:25,263-Speed 11246.52 samples/sec Loss 8.7070 LearningRate 0.0683 Epoch: 6 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:26,182-Speed 11155.32 samples/sec Loss 8.6584 LearningRate 0.0683 Epoch: 6 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:27,119-Speed 10945.15 samples/sec Loss 8.6491 LearningRate 0.0683 Epoch: 6 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:28,055-Speed 10939.91 samples/sec Loss 8.6641 LearningRate 0.0683 Epoch: 6 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:28,958-Speed 11354.22 samples/sec Loss 8.6995 LearningRate 0.0683 Epoch: 6 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:29,883-Speed 11081.26 samples/sec Loss 8.7497 LearningRate 0.0683 Epoch: 6 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:30,835-Speed 10779.19 samples/sec Loss 8.6171 LearningRate 0.0683 Epoch: 6 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:31,785-Speed 10787.66 samples/sec Loss 8.6365 LearningRate 0.0682 Epoch: 6 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:32,692-Speed 11290.14 samples/sec Loss 8.6725 LearningRate 0.0682 Epoch: 6 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:33,585-Speed 11480.63 samples/sec Loss 8.5910 LearningRate 0.0682 Epoch: 6 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:34,484-Speed 11394.68 samples/sec Loss 8.5805 LearningRate 0.0682 Epoch: 6 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:35,365-Speed 11626.82 samples/sec Loss 8.5917 LearningRate 0.0682 Epoch: 6 Global Step: 39590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:36,246-Speed 11640.12 samples/sec Loss 8.6926 LearningRate 0.0682 Epoch: 6 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:37,164-Speed 11161.44 samples/sec Loss 8.5787 LearningRate 0.0682 Epoch: 6 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:38,112-Speed 10806.63 samples/sec Loss 8.5529 LearningRate 0.0682 Epoch: 6 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:39,088-Speed 10505.88 samples/sec Loss 8.5235 LearningRate 0.0682 Epoch: 6 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:40,058-Speed 10562.03 samples/sec Loss 8.5744 LearningRate 0.0682 Epoch: 6 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:40,958-Speed 11377.71 samples/sec Loss 8.6261 LearningRate 0.0682 Epoch: 6 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:41,877-Speed 11151.10 samples/sec Loss 8.6846 LearningRate 0.0682 Epoch: 6 Global Step: 39660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:38:42,814-Speed 10943.89 samples/sec Loss 8.5958 LearningRate 0.0682 Epoch: 6 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:43,799-Speed 10406.50 samples/sec Loss 8.5793 LearningRate 0.0682 Epoch: 6 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:44,738-Speed 10911.18 samples/sec Loss 8.7777 LearningRate 0.0681 Epoch: 6 Global Step: 39690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:45,612-Speed 11728.13 samples/sec Loss 8.5621 LearningRate 0.0681 Epoch: 6 Global Step: 39700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:46,510-Speed 11406.69 samples/sec Loss 8.6877 LearningRate 0.0681 Epoch: 6 Global Step: 39710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:47,423-Speed 11220.08 samples/sec Loss 8.5359 LearningRate 0.0681 Epoch: 6 Global Step: 39720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:48,321-Speed 11408.42 samples/sec Loss 8.7465 LearningRate 0.0681 Epoch: 6 Global Step: 39730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:49,249-Speed 11043.41 samples/sec Loss 8.6749 LearningRate 0.0681 Epoch: 6 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:50,158-Speed 11279.81 samples/sec Loss 8.6453 LearningRate 0.0681 Epoch: 6 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:51,063-Speed 11320.65 samples/sec Loss 8.7129 LearningRate 0.0681 Epoch: 6 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:51,953-Speed 11518.42 samples/sec Loss 8.5473 LearningRate 0.0681 Epoch: 6 Global Step: 39770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:52,875-Speed 11107.19 samples/sec Loss 8.4503 LearningRate 0.0681 Epoch: 6 Global Step: 39780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:38:53,807-Speed 10995.25 samples/sec Loss 8.6386 LearningRate 0.0681 Epoch: 6 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:38:54,839-Speed 9933.45 samples/sec Loss 8.6281 LearningRate 0.0681 Epoch: 6 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:05,094-Speed 998.60 samples/sec Loss 7.9877 LearningRate 0.0681 Epoch: 7 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:06,078-Speed 10415.04 samples/sec Loss 7.8686 LearningRate 0.0680 Epoch: 7 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:07,024-Speed 10842.63 samples/sec Loss 7.8467 LearningRate 0.0680 Epoch: 7 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:08,011-Speed 10383.76 samples/sec Loss 7.7938 LearningRate 0.0680 Epoch: 7 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:08,999-Speed 10373.93 samples/sec Loss 7.8911 LearningRate 0.0680 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:10,060-Speed 9662.18 samples/sec Loss 8.0559 LearningRate 0.0680 Epoch: 7 Global Step: 39860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:10,972-Speed 11232.61 samples/sec Loss 8.0558 LearningRate 0.0680 Epoch: 7 Global Step: 39870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:11,908-Speed 10949.58 samples/sec Loss 7.8990 LearningRate 0.0680 Epoch: 7 Global Step: 39880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:12,817-Speed 11268.40 samples/sec Loss 8.1029 LearningRate 0.0680 Epoch: 7 Global Step: 39890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:13,761-Speed 10857.92 samples/sec Loss 7.9777 LearningRate 0.0680 Epoch: 7 Global Step: 39900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:14,662-Speed 11378.89 samples/sec Loss 7.9072 LearningRate 0.0680 Epoch: 7 Global Step: 39910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:15,565-Speed 11351.69 samples/sec Loss 7.9011 LearningRate 0.0680 Epoch: 7 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:16,492-Speed 11044.95 samples/sec Loss 7.9278 LearningRate 0.0680 Epoch: 7 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:17,439-Speed 10827.54 samples/sec Loss 8.0205 LearningRate 0.0680 Epoch: 7 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:39:18,404-Speed 10609.87 samples/sec Loss 7.7433 LearningRate 0.0680 Epoch: 7 Global Step: 39950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:39:19,348-Speed 10856.77 samples/sec Loss 7.9291 LearningRate 0.0679 Epoch: 7 Global Step: 39960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:39:20,306-Speed 10699.40 samples/sec Loss 7.8917 LearningRate 0.0679 Epoch: 7 Global Step: 39970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:39:21,219-Speed 11223.12 samples/sec Loss 7.9966 LearningRate 0.0679 Epoch: 7 Global Step: 39980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:39:22,175-Speed 10713.87 samples/sec Loss 8.2546 LearningRate 0.0679 Epoch: 7 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:39:23,081-Speed 11310.34 samples/sec Loss 7.9831 LearningRate 0.0679 Epoch: 7 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:39:44,439-[lfw][40000]XNorm: 13.607932 Training: 2022-04-11 00:39:44,439-[lfw][40000]Accuracy-Flip: 0.99233+-0.00389 Training: 2022-04-11 00:39:44,440-[lfw][40000]Accuracy-Highest: 0.99433 Training: 2022-04-11 00:40:08,994-[cfp_fp][40000]XNorm: 11.396159 Training: 2022-04-11 00:40:08,995-[cfp_fp][40000]Accuracy-Flip: 0.92643+-0.01311 Training: 2022-04-11 00:40:08,995-[cfp_fp][40000]Accuracy-Highest: 0.92700 Training: 2022-04-11 00:40:30,090-[agedb_30][40000]XNorm: 13.260059 Training: 2022-04-11 00:40:30,090-[agedb_30][40000]Accuracy-Flip: 0.95183+-0.00790 Training: 2022-04-11 00:40:30,091-[agedb_30][40000]Accuracy-Highest: 0.95517 Training: 2022-04-11 00:40:31,004-Speed 150.76 samples/sec Loss 8.0028 LearningRate 0.0679 Epoch: 7 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:31,891-Speed 11562.29 samples/sec Loss 8.0307 LearningRate 0.0679 Epoch: 7 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:32,831-Speed 10897.86 samples/sec Loss 8.1902 LearningRate 0.0679 Epoch: 7 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:33,727-Speed 11429.94 samples/sec Loss 8.1477 LearningRate 0.0679 Epoch: 7 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:34,626-Speed 11402.55 samples/sec Loss 8.1031 LearningRate 0.0679 Epoch: 7 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:35,550-Speed 11095.13 samples/sec Loss 8.0768 LearningRate 0.0679 Epoch: 7 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:36,488-Speed 10918.42 samples/sec Loss 7.8449 LearningRate 0.0679 Epoch: 7 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:37,422-Speed 10974.63 samples/sec Loss 7.9463 LearningRate 0.0679 Epoch: 7 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:38,386-Speed 10636.76 samples/sec Loss 8.3055 LearningRate 0.0679 Epoch: 7 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:39,351-Speed 10618.49 samples/sec Loss 8.0508 LearningRate 0.0678 Epoch: 7 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:40,255-Speed 11334.53 samples/sec Loss 7.9772 LearningRate 0.0678 Epoch: 7 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:41,166-Speed 11249.99 samples/sec Loss 8.2241 LearningRate 0.0678 Epoch: 7 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:42,103-Speed 10935.08 samples/sec Loss 8.0917 LearningRate 0.0678 Epoch: 7 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:43,041-Speed 10926.55 samples/sec Loss 8.1983 LearningRate 0.0678 Epoch: 7 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:43,964-Speed 11092.94 samples/sec Loss 8.1923 LearningRate 0.0678 Epoch: 7 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:44,868-Speed 11340.85 samples/sec Loss 8.0733 LearningRate 0.0678 Epoch: 7 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:45,752-Speed 11594.73 samples/sec Loss 8.2314 LearningRate 0.0678 Epoch: 7 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:46,649-Speed 11418.39 samples/sec Loss 8.1735 LearningRate 0.0678 Epoch: 7 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:47,569-Speed 11132.64 samples/sec Loss 8.1516 LearningRate 0.0678 Epoch: 7 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:48,497-Speed 11042.04 samples/sec Loss 8.1163 LearningRate 0.0678 Epoch: 7 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:49,405-Speed 11293.33 samples/sec Loss 8.2259 LearningRate 0.0678 Epoch: 7 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:50,281-Speed 11691.56 samples/sec Loss 8.1785 LearningRate 0.0678 Epoch: 7 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:51,170-Speed 11530.92 samples/sec Loss 8.0766 LearningRate 0.0678 Epoch: 7 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:52,121-Speed 10774.18 samples/sec Loss 8.0644 LearningRate 0.0677 Epoch: 7 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:53,024-Speed 11341.96 samples/sec Loss 8.1034 LearningRate 0.0677 Epoch: 7 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:40:53,918-Speed 11464.54 samples/sec Loss 8.0804 LearningRate 0.0677 Epoch: 7 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:54,843-Speed 11081.69 samples/sec Loss 8.1491 LearningRate 0.0677 Epoch: 7 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:55,766-Speed 11091.90 samples/sec Loss 8.2392 LearningRate 0.0677 Epoch: 7 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:56,671-Speed 11322.60 samples/sec Loss 8.1335 LearningRate 0.0677 Epoch: 7 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:57,592-Speed 11136.16 samples/sec Loss 8.1265 LearningRate 0.0677 Epoch: 7 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:58,490-Speed 11403.80 samples/sec Loss 8.1814 LearningRate 0.0677 Epoch: 7 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:40:59,374-Speed 11589.40 samples/sec Loss 8.2816 LearningRate 0.0677 Epoch: 7 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:00,290-Speed 11185.20 samples/sec Loss 8.2078 LearningRate 0.0677 Epoch: 7 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:01,183-Speed 11479.34 samples/sec Loss 8.3153 LearningRate 0.0677 Epoch: 7 Global Step: 40340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:02,101-Speed 11160.06 samples/sec Loss 8.1113 LearningRate 0.0677 Epoch: 7 Global Step: 40350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:03,040-Speed 10913.74 samples/sec Loss 8.1843 LearningRate 0.0677 Epoch: 7 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:04,009-Speed 10574.43 samples/sec Loss 8.2115 LearningRate 0.0677 Epoch: 7 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:04,937-Speed 11049.20 samples/sec Loss 8.4546 LearningRate 0.0676 Epoch: 7 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:05,857-Speed 11135.66 samples/sec Loss 8.1716 LearningRate 0.0676 Epoch: 7 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:06,757-Speed 11382.88 samples/sec Loss 8.1744 LearningRate 0.0676 Epoch: 7 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:07,668-Speed 11258.42 samples/sec Loss 8.2644 LearningRate 0.0676 Epoch: 7 Global Step: 40410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:08,563-Speed 11447.62 samples/sec Loss 8.3274 LearningRate 0.0676 Epoch: 7 Global Step: 40420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:09,456-Speed 11477.33 samples/sec Loss 8.0390 LearningRate 0.0676 Epoch: 7 Global Step: 40430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:10,392-Speed 10949.14 samples/sec Loss 8.3562 LearningRate 0.0676 Epoch: 7 Global Step: 40440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:11,292-Speed 11376.71 samples/sec Loss 8.2960 LearningRate 0.0676 Epoch: 7 Global Step: 40450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:12,223-Speed 11011.15 samples/sec Loss 8.2304 LearningRate 0.0676 Epoch: 7 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:13,157-Speed 10966.96 samples/sec Loss 8.3399 LearningRate 0.0676 Epoch: 7 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:14,118-Speed 10670.45 samples/sec Loss 8.2417 LearningRate 0.0676 Epoch: 7 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:15,109-Speed 10330.90 samples/sec Loss 8.2842 LearningRate 0.0676 Epoch: 7 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:16,043-Speed 10979.39 samples/sec Loss 8.1755 LearningRate 0.0676 Epoch: 7 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:17,158-Speed 9184.80 samples/sec Loss 8.1441 LearningRate 0.0675 Epoch: 7 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:18,146-Speed 10367.95 samples/sec Loss 8.2438 LearningRate 0.0675 Epoch: 7 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:19,069-Speed 11104.19 samples/sec Loss 8.1810 LearningRate 0.0675 Epoch: 7 Global Step: 40530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:19,999-Speed 11025.02 samples/sec Loss 8.2730 LearningRate 0.0675 Epoch: 7 Global Step: 40540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:20,906-Speed 11290.24 samples/sec Loss 8.2316 LearningRate 0.0675 Epoch: 7 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:21,797-Speed 11501.20 samples/sec Loss 8.2632 LearningRate 0.0675 Epoch: 7 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:22,709-Speed 11236.92 samples/sec Loss 8.4538 LearningRate 0.0675 Epoch: 7 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:23,626-Speed 11173.67 samples/sec Loss 8.2725 LearningRate 0.0675 Epoch: 7 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:24,543-Speed 11172.37 samples/sec Loss 8.3039 LearningRate 0.0675 Epoch: 7 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:25,467-Speed 11096.19 samples/sec Loss 8.2213 LearningRate 0.0675 Epoch: 7 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:26,369-Speed 11356.76 samples/sec Loss 8.3187 LearningRate 0.0675 Epoch: 7 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:27,284-Speed 11208.84 samples/sec Loss 8.3265 LearningRate 0.0675 Epoch: 7 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:28,213-Speed 11022.17 samples/sec Loss 8.4731 LearningRate 0.0675 Epoch: 7 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:29,124-Speed 11249.31 samples/sec Loss 8.4670 LearningRate 0.0675 Epoch: 7 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:30,059-Speed 10958.61 samples/sec Loss 8.3305 LearningRate 0.0674 Epoch: 7 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:30,973-Speed 11217.96 samples/sec Loss 8.4194 LearningRate 0.0674 Epoch: 7 Global Step: 40660 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:41:31,851-Speed 11663.05 samples/sec Loss 8.3886 LearningRate 0.0674 Epoch: 7 Global Step: 40670 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:41:32,768-Speed 11180.65 samples/sec Loss 8.5138 LearningRate 0.0674 Epoch: 7 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:33,697-Speed 11025.29 samples/sec Loss 8.2695 LearningRate 0.0674 Epoch: 7 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:34,598-Speed 11374.22 samples/sec Loss 8.2754 LearningRate 0.0674 Epoch: 7 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:35,494-Speed 11441.58 samples/sec Loss 8.2246 LearningRate 0.0674 Epoch: 7 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:36,422-Speed 11035.35 samples/sec Loss 8.4857 LearningRate 0.0674 Epoch: 7 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:37,335-Speed 11220.52 samples/sec Loss 8.4718 LearningRate 0.0674 Epoch: 7 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:38,229-Speed 11473.11 samples/sec Loss 8.4071 LearningRate 0.0674 Epoch: 7 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:39,140-Speed 11240.31 samples/sec Loss 8.3167 LearningRate 0.0674 Epoch: 7 Global Step: 40750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:40,073-Speed 10988.86 samples/sec Loss 8.4864 LearningRate 0.0674 Epoch: 7 Global Step: 40760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:40,979-Speed 11309.85 samples/sec Loss 8.2719 LearningRate 0.0674 Epoch: 7 Global Step: 40770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:41,867-Speed 11538.69 samples/sec Loss 8.4308 LearningRate 0.0674 Epoch: 7 Global Step: 40780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:42,764-Speed 11417.50 samples/sec Loss 8.4175 LearningRate 0.0673 Epoch: 7 Global Step: 40790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:43,679-Speed 11202.45 samples/sec Loss 8.3856 LearningRate 0.0673 Epoch: 7 Global Step: 40800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:44,588-Speed 11268.48 samples/sec Loss 8.2694 LearningRate 0.0673 Epoch: 7 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:45,475-Speed 11554.48 samples/sec Loss 8.4588 LearningRate 0.0673 Epoch: 7 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:46,388-Speed 11221.07 samples/sec Loss 8.3841 LearningRate 0.0673 Epoch: 7 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:47,311-Speed 11102.90 samples/sec Loss 8.3545 LearningRate 0.0673 Epoch: 7 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:48,259-Speed 10818.71 samples/sec Loss 8.5284 LearningRate 0.0673 Epoch: 7 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:49,184-Speed 11070.34 samples/sec Loss 8.4618 LearningRate 0.0673 Epoch: 7 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:50,134-Speed 10787.30 samples/sec Loss 8.3668 LearningRate 0.0673 Epoch: 7 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:51,068-Speed 10974.27 samples/sec Loss 8.4218 LearningRate 0.0673 Epoch: 7 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:52,028-Speed 10678.16 samples/sec Loss 8.2965 LearningRate 0.0673 Epoch: 7 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:52,939-Speed 11245.71 samples/sec Loss 8.4456 LearningRate 0.0673 Epoch: 7 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:53,843-Speed 11340.25 samples/sec Loss 8.5437 LearningRate 0.0673 Epoch: 7 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:54,749-Speed 11306.11 samples/sec Loss 8.3527 LearningRate 0.0673 Epoch: 7 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:55,652-Speed 11356.59 samples/sec Loss 8.3083 LearningRate 0.0672 Epoch: 7 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:56,560-Speed 11282.16 samples/sec Loss 8.4449 LearningRate 0.0672 Epoch: 7 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:41:57,454-Speed 11466.17 samples/sec Loss 8.2680 LearningRate 0.0672 Epoch: 7 Global Step: 40950 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:41:58,336-Speed 11620.65 samples/sec Loss 8.2880 LearningRate 0.0672 Epoch: 7 Global Step: 40960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:41:59,283-Speed 10818.45 samples/sec Loss 8.3788 LearningRate 0.0672 Epoch: 7 Global Step: 40970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:00,201-Speed 11159.87 samples/sec Loss 8.3999 LearningRate 0.0672 Epoch: 7 Global Step: 40980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:01,127-Speed 11068.69 samples/sec Loss 8.4151 LearningRate 0.0672 Epoch: 7 Global Step: 40990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:02,046-Speed 11149.61 samples/sec Loss 8.4528 LearningRate 0.0672 Epoch: 7 Global Step: 41000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:02,962-Speed 11186.48 samples/sec Loss 8.3015 LearningRate 0.0672 Epoch: 7 Global Step: 41010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:03,862-Speed 11388.05 samples/sec Loss 8.3276 LearningRate 0.0672 Epoch: 7 Global Step: 41020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:04,794-Speed 10988.45 samples/sec Loss 8.5713 LearningRate 0.0672 Epoch: 7 Global Step: 41030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:05,725-Speed 11011.73 samples/sec Loss 8.3751 LearningRate 0.0672 Epoch: 7 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:06,620-Speed 11447.03 samples/sec Loss 8.4437 LearningRate 0.0672 Epoch: 7 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:07,522-Speed 11364.81 samples/sec Loss 8.3149 LearningRate 0.0672 Epoch: 7 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:08,400-Speed 11663.05 samples/sec Loss 8.3670 LearningRate 0.0671 Epoch: 7 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:09,332-Speed 11004.00 samples/sec Loss 8.5532 LearningRate 0.0671 Epoch: 7 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:10,242-Speed 11261.01 samples/sec Loss 8.3841 LearningRate 0.0671 Epoch: 7 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:11,126-Speed 11585.87 samples/sec Loss 8.4106 LearningRate 0.0671 Epoch: 7 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:12,033-Speed 11305.71 samples/sec Loss 8.2879 LearningRate 0.0671 Epoch: 7 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:12,974-Speed 10882.47 samples/sec Loss 8.4187 LearningRate 0.0671 Epoch: 7 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:13,929-Speed 10733.90 samples/sec Loss 8.3836 LearningRate 0.0671 Epoch: 7 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:14,839-Speed 11264.32 samples/sec Loss 8.4294 LearningRate 0.0671 Epoch: 7 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:15,739-Speed 11384.34 samples/sec Loss 8.3563 LearningRate 0.0671 Epoch: 7 Global Step: 41150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:16,682-Speed 10861.51 samples/sec Loss 8.4806 LearningRate 0.0671 Epoch: 7 Global Step: 41160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:17,585-Speed 11343.12 samples/sec Loss 8.4068 LearningRate 0.0671 Epoch: 7 Global Step: 41170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:18,560-Speed 10516.56 samples/sec Loss 8.2938 LearningRate 0.0671 Epoch: 7 Global Step: 41180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:19,477-Speed 11172.15 samples/sec Loss 8.3288 LearningRate 0.0671 Epoch: 7 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:20,402-Speed 11074.30 samples/sec Loss 8.3345 LearningRate 0.0671 Epoch: 7 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:21,329-Speed 11062.20 samples/sec Loss 8.3509 LearningRate 0.0670 Epoch: 7 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:22,230-Speed 11372.45 samples/sec Loss 8.4953 LearningRate 0.0670 Epoch: 7 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:23,159-Speed 11029.92 samples/sec Loss 8.2935 LearningRate 0.0670 Epoch: 7 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:24,067-Speed 11278.93 samples/sec Loss 8.4555 LearningRate 0.0670 Epoch: 7 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:24,970-Speed 11343.83 samples/sec Loss 8.3647 LearningRate 0.0670 Epoch: 7 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:25,883-Speed 11227.69 samples/sec Loss 8.3823 LearningRate 0.0670 Epoch: 7 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:26,813-Speed 11023.88 samples/sec Loss 8.6099 LearningRate 0.0670 Epoch: 7 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:27,713-Speed 11381.91 samples/sec Loss 8.4941 LearningRate 0.0670 Epoch: 7 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:28,638-Speed 11072.80 samples/sec Loss 8.4999 LearningRate 0.0670 Epoch: 7 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:29,566-Speed 11060.15 samples/sec Loss 8.4377 LearningRate 0.0670 Epoch: 7 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:30,498-Speed 10993.57 samples/sec Loss 8.3027 LearningRate 0.0670 Epoch: 7 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:31,445-Speed 10824.39 samples/sec Loss 8.4499 LearningRate 0.0670 Epoch: 7 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:32,369-Speed 11084.37 samples/sec Loss 8.3765 LearningRate 0.0670 Epoch: 7 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:33,296-Speed 11059.25 samples/sec Loss 8.3209 LearningRate 0.0670 Epoch: 7 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:34,204-Speed 11293.70 samples/sec Loss 8.4390 LearningRate 0.0669 Epoch: 7 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:35,141-Speed 10927.00 samples/sec Loss 8.4462 LearningRate 0.0669 Epoch: 7 Global Step: 41360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:36,038-Speed 11430.88 samples/sec Loss 8.4338 LearningRate 0.0669 Epoch: 7 Global Step: 41370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:36,989-Speed 10772.37 samples/sec Loss 8.3324 LearningRate 0.0669 Epoch: 7 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:37,931-Speed 10876.85 samples/sec Loss 8.3987 LearningRate 0.0669 Epoch: 7 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:38,876-Speed 10840.68 samples/sec Loss 8.3109 LearningRate 0.0669 Epoch: 7 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:39,798-Speed 11117.90 samples/sec Loss 8.4712 LearningRate 0.0669 Epoch: 7 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:40,766-Speed 10578.74 samples/sec Loss 8.3828 LearningRate 0.0669 Epoch: 7 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:41,721-Speed 10735.53 samples/sec Loss 8.2126 LearningRate 0.0669 Epoch: 7 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:42,620-Speed 11391.96 samples/sec Loss 8.3985 LearningRate 0.0669 Epoch: 7 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:43,502-Speed 11630.62 samples/sec Loss 8.4284 LearningRate 0.0669 Epoch: 7 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:44,392-Speed 11512.69 samples/sec Loss 8.5775 LearningRate 0.0669 Epoch: 7 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:45,293-Speed 11374.17 samples/sec Loss 8.3017 LearningRate 0.0669 Epoch: 7 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:46,190-Speed 11413.53 samples/sec Loss 8.4747 LearningRate 0.0669 Epoch: 7 Global Step: 41480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:47,097-Speed 11305.93 samples/sec Loss 8.4732 LearningRate 0.0668 Epoch: 7 Global Step: 41490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:48,037-Speed 10903.33 samples/sec Loss 8.3329 LearningRate 0.0668 Epoch: 7 Global Step: 41500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:48,939-Speed 11353.63 samples/sec Loss 8.3807 LearningRate 0.0668 Epoch: 7 Global Step: 41510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:49,834-Speed 11453.55 samples/sec Loss 8.3560 LearningRate 0.0668 Epoch: 7 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:50,780-Speed 10837.62 samples/sec Loss 8.4610 LearningRate 0.0668 Epoch: 7 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:51,741-Speed 10655.63 samples/sec Loss 8.4378 LearningRate 0.0668 Epoch: 7 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:52,695-Speed 10745.81 samples/sec Loss 8.5578 LearningRate 0.0668 Epoch: 7 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:53,637-Speed 10874.77 samples/sec Loss 8.4162 LearningRate 0.0668 Epoch: 7 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:54,575-Speed 10932.70 samples/sec Loss 8.4519 LearningRate 0.0668 Epoch: 7 Global Step: 41570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:42:55,488-Speed 11212.21 samples/sec Loss 8.4853 LearningRate 0.0668 Epoch: 7 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:56,397-Speed 11279.26 samples/sec Loss 8.5532 LearningRate 0.0668 Epoch: 7 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:57,298-Speed 11373.05 samples/sec Loss 8.6596 LearningRate 0.0668 Epoch: 7 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:58,191-Speed 11469.51 samples/sec Loss 8.3913 LearningRate 0.0668 Epoch: 7 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:42:59,142-Speed 10775.73 samples/sec Loss 8.4141 LearningRate 0.0667 Epoch: 7 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:00,068-Speed 11066.13 samples/sec Loss 8.2637 LearningRate 0.0667 Epoch: 7 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:00,984-Speed 11184.83 samples/sec Loss 8.2230 LearningRate 0.0667 Epoch: 7 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:01,898-Speed 11213.95 samples/sec Loss 8.3812 LearningRate 0.0667 Epoch: 7 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:02,833-Speed 10954.49 samples/sec Loss 8.4062 LearningRate 0.0667 Epoch: 7 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:03,742-Speed 11284.41 samples/sec Loss 8.4158 LearningRate 0.0667 Epoch: 7 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:04,672-Speed 11010.41 samples/sec Loss 8.3505 LearningRate 0.0667 Epoch: 7 Global Step: 41680 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:43:05,588-Speed 11183.54 samples/sec Loss 8.4013 LearningRate 0.0667 Epoch: 7 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:06,501-Speed 11223.73 samples/sec Loss 8.4696 LearningRate 0.0667 Epoch: 7 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:07,463-Speed 10654.01 samples/sec Loss 8.3677 LearningRate 0.0667 Epoch: 7 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:08,371-Speed 11281.80 samples/sec Loss 8.3225 LearningRate 0.0667 Epoch: 7 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:09,321-Speed 10787.42 samples/sec Loss 8.5316 LearningRate 0.0667 Epoch: 7 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:10,228-Speed 11299.51 samples/sec Loss 8.4537 LearningRate 0.0667 Epoch: 7 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:11,136-Speed 11278.81 samples/sec Loss 8.5342 LearningRate 0.0667 Epoch: 7 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:12,120-Speed 10421.43 samples/sec Loss 8.3830 LearningRate 0.0666 Epoch: 7 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:13,034-Speed 11208.04 samples/sec Loss 8.3636 LearningRate 0.0666 Epoch: 7 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:13,969-Speed 10965.98 samples/sec Loss 8.3229 LearningRate 0.0666 Epoch: 7 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:14,865-Speed 11434.57 samples/sec Loss 8.6013 LearningRate 0.0666 Epoch: 7 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:15,764-Speed 11403.03 samples/sec Loss 8.4865 LearningRate 0.0666 Epoch: 7 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:16,705-Speed 10885.83 samples/sec Loss 8.5816 LearningRate 0.0666 Epoch: 7 Global Step: 41810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:17,604-Speed 11391.52 samples/sec Loss 8.5379 LearningRate 0.0666 Epoch: 7 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:18,490-Speed 11575.24 samples/sec Loss 8.5060 LearningRate 0.0666 Epoch: 7 Global Step: 41830 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:19,404-Speed 11201.81 samples/sec Loss 8.4897 LearningRate 0.0666 Epoch: 7 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:20,317-Speed 11231.02 samples/sec Loss 8.6076 LearningRate 0.0666 Epoch: 7 Global Step: 41850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:21,190-Speed 11733.16 samples/sec Loss 8.4819 LearningRate 0.0666 Epoch: 7 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:22,086-Speed 11442.14 samples/sec Loss 8.4662 LearningRate 0.0666 Epoch: 7 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:23,046-Speed 10673.71 samples/sec Loss 8.6451 LearningRate 0.0666 Epoch: 7 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:23,973-Speed 11054.77 samples/sec Loss 8.6150 LearningRate 0.0666 Epoch: 7 Global Step: 41890 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:43:24,904-Speed 10999.92 samples/sec Loss 8.4235 LearningRate 0.0665 Epoch: 7 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:25,823-Speed 11151.80 samples/sec Loss 8.3659 LearningRate 0.0665 Epoch: 7 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:26,745-Speed 11120.43 samples/sec Loss 8.6299 LearningRate 0.0665 Epoch: 7 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:27,678-Speed 10982.04 samples/sec Loss 8.5396 LearningRate 0.0665 Epoch: 7 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:28,624-Speed 10822.11 samples/sec Loss 8.5535 LearningRate 0.0665 Epoch: 7 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:43:29,542-Speed 11168.40 samples/sec Loss 8.3912 LearningRate 0.0665 Epoch: 7 Global Step: 41950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:43:30,454-Speed 11227.08 samples/sec Loss 8.5065 LearningRate 0.0665 Epoch: 7 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:43:31,374-Speed 11140.18 samples/sec Loss 8.4803 LearningRate 0.0665 Epoch: 7 Global Step: 41970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:43:32,313-Speed 10909.35 samples/sec Loss 8.4430 LearningRate 0.0665 Epoch: 7 Global Step: 41980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:43:33,194-Speed 11645.99 samples/sec Loss 8.4810 LearningRate 0.0665 Epoch: 7 Global Step: 41990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:43:34,090-Speed 11433.78 samples/sec Loss 8.4841 LearningRate 0.0665 Epoch: 7 Global Step: 42000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:43:55,337-[lfw][42000]XNorm: 13.846493 Training: 2022-04-11 00:43:55,337-[lfw][42000]Accuracy-Flip: 0.99467+-0.00393 Training: 2022-04-11 00:43:55,338-[lfw][42000]Accuracy-Highest: 0.99467 Training: 2022-04-11 00:44:19,893-[cfp_fp][42000]XNorm: 11.541922 Training: 2022-04-11 00:44:19,894-[cfp_fp][42000]Accuracy-Flip: 0.93200+-0.01477 Training: 2022-04-11 00:44:19,894-[cfp_fp][42000]Accuracy-Highest: 0.93200 Training: 2022-04-11 00:44:41,047-[agedb_30][42000]XNorm: 13.523298 Training: 2022-04-11 00:44:41,047-[agedb_30][42000]Accuracy-Flip: 0.95450+-0.00837 Training: 2022-04-11 00:44:41,048-[agedb_30][42000]Accuracy-Highest: 0.95517 Training: 2022-04-11 00:44:41,972-Speed 150.85 samples/sec Loss 8.3749 LearningRate 0.0665 Epoch: 7 Global Step: 42010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:42,899-Speed 11051.54 samples/sec Loss 8.3597 LearningRate 0.0665 Epoch: 7 Global Step: 42020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:43,797-Speed 11409.09 samples/sec Loss 8.5120 LearningRate 0.0665 Epoch: 7 Global Step: 42030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:44,697-Speed 11400.51 samples/sec Loss 8.4249 LearningRate 0.0664 Epoch: 7 Global Step: 42040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:45,582-Speed 11580.19 samples/sec Loss 8.5323 LearningRate 0.0664 Epoch: 7 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:44:46,470-Speed 11544.38 samples/sec Loss 8.4990 LearningRate 0.0664 Epoch: 7 Global Step: 42060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:47,358-Speed 11536.62 samples/sec Loss 8.4510 LearningRate 0.0664 Epoch: 7 Global Step: 42070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:48,260-Speed 11355.61 samples/sec Loss 8.4496 LearningRate 0.0664 Epoch: 7 Global Step: 42080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:49,171-Speed 11247.08 samples/sec Loss 8.2710 LearningRate 0.0664 Epoch: 7 Global Step: 42090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:50,091-Speed 11139.35 samples/sec Loss 8.4918 LearningRate 0.0664 Epoch: 7 Global Step: 42100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:51,059-Speed 10594.13 samples/sec Loss 8.3149 LearningRate 0.0664 Epoch: 7 Global Step: 42110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:51,998-Speed 10910.75 samples/sec Loss 8.4846 LearningRate 0.0664 Epoch: 7 Global Step: 42120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:52,960-Speed 10648.08 samples/sec Loss 8.5517 LearningRate 0.0664 Epoch: 7 Global Step: 42130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:53,890-Speed 11021.36 samples/sec Loss 8.4493 LearningRate 0.0664 Epoch: 7 Global Step: 42140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:54,781-Speed 11502.17 samples/sec Loss 8.4015 LearningRate 0.0664 Epoch: 7 Global Step: 42150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:55,722-Speed 10889.16 samples/sec Loss 8.4388 LearningRate 0.0664 Epoch: 7 Global Step: 42160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:56,644-Speed 11122.05 samples/sec Loss 8.5318 LearningRate 0.0664 Epoch: 7 Global Step: 42170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:57,541-Speed 11430.62 samples/sec Loss 8.4413 LearningRate 0.0663 Epoch: 7 Global Step: 42180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:58,474-Speed 10983.32 samples/sec Loss 8.3566 LearningRate 0.0663 Epoch: 7 Global Step: 42190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:44:59,389-Speed 11197.45 samples/sec Loss 8.4052 LearningRate 0.0663 Epoch: 7 Global Step: 42200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:00,330-Speed 10884.85 samples/sec Loss 8.3771 LearningRate 0.0663 Epoch: 7 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:01,242-Speed 11232.84 samples/sec Loss 8.3991 LearningRate 0.0663 Epoch: 7 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:02,187-Speed 10849.56 samples/sec Loss 8.4916 LearningRate 0.0663 Epoch: 7 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:03,155-Speed 10587.86 samples/sec Loss 8.4909 LearningRate 0.0663 Epoch: 7 Global Step: 42240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:04,135-Speed 10448.92 samples/sec Loss 8.4029 LearningRate 0.0663 Epoch: 7 Global Step: 42250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:05,057-Speed 11120.62 samples/sec Loss 8.5040 LearningRate 0.0663 Epoch: 7 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:05,986-Speed 11027.56 samples/sec Loss 8.5075 LearningRate 0.0663 Epoch: 7 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:06,899-Speed 11219.68 samples/sec Loss 8.2149 LearningRate 0.0663 Epoch: 7 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:07,814-Speed 11201.11 samples/sec Loss 8.5822 LearningRate 0.0663 Epoch: 7 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:08,721-Speed 11295.98 samples/sec Loss 8.4528 LearningRate 0.0663 Epoch: 7 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:09,598-Speed 11692.54 samples/sec Loss 8.4993 LearningRate 0.0663 Epoch: 7 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:10,488-Speed 11510.69 samples/sec Loss 8.5950 LearningRate 0.0662 Epoch: 7 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:11,407-Speed 11149.79 samples/sec Loss 8.5082 LearningRate 0.0662 Epoch: 7 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:12,366-Speed 10690.40 samples/sec Loss 8.5895 LearningRate 0.0662 Epoch: 7 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:13,290-Speed 11088.91 samples/sec Loss 8.6121 LearningRate 0.0662 Epoch: 7 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:14,242-Speed 10761.08 samples/sec Loss 8.5071 LearningRate 0.0662 Epoch: 7 Global Step: 42360 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:45:15,131-Speed 11530.89 samples/sec Loss 8.3447 LearningRate 0.0662 Epoch: 7 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:16,047-Speed 11182.54 samples/sec Loss 8.5458 LearningRate 0.0662 Epoch: 7 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:16,953-Speed 11303.08 samples/sec Loss 8.5398 LearningRate 0.0662 Epoch: 7 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:17,854-Speed 11381.18 samples/sec Loss 8.4279 LearningRate 0.0662 Epoch: 7 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:18,751-Speed 11420.71 samples/sec Loss 8.6794 LearningRate 0.0662 Epoch: 7 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:19,709-Speed 10698.16 samples/sec Loss 8.4450 LearningRate 0.0662 Epoch: 7 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:20,607-Speed 11408.71 samples/sec Loss 8.3561 LearningRate 0.0662 Epoch: 7 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:21,498-Speed 11502.23 samples/sec Loss 8.5441 LearningRate 0.0662 Epoch: 7 Global Step: 42440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:22,404-Speed 11302.92 samples/sec Loss 8.3665 LearningRate 0.0662 Epoch: 7 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:23,322-Speed 11168.87 samples/sec Loss 8.4103 LearningRate 0.0661 Epoch: 7 Global Step: 42460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:24,226-Speed 11334.40 samples/sec Loss 8.3427 LearningRate 0.0661 Epoch: 7 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:25,151-Speed 11072.01 samples/sec Loss 8.3451 LearningRate 0.0661 Epoch: 7 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:26,098-Speed 10821.66 samples/sec Loss 8.6390 LearningRate 0.0661 Epoch: 7 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:27,049-Speed 10780.93 samples/sec Loss 8.4636 LearningRate 0.0661 Epoch: 7 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:27,954-Speed 11316.42 samples/sec Loss 8.4303 LearningRate 0.0661 Epoch: 7 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:28,875-Speed 11132.03 samples/sec Loss 8.1546 LearningRate 0.0661 Epoch: 7 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:29,819-Speed 10873.16 samples/sec Loss 8.3706 LearningRate 0.0661 Epoch: 7 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:30,750-Speed 11010.12 samples/sec Loss 8.5652 LearningRate 0.0661 Epoch: 7 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:31,712-Speed 10650.24 samples/sec Loss 8.4216 LearningRate 0.0661 Epoch: 7 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:32,616-Speed 11330.84 samples/sec Loss 8.4529 LearningRate 0.0661 Epoch: 7 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:33,510-Speed 11463.89 samples/sec Loss 8.3040 LearningRate 0.0661 Epoch: 7 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:34,393-Speed 11606.13 samples/sec Loss 8.4559 LearningRate 0.0661 Epoch: 7 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:35,299-Speed 11308.74 samples/sec Loss 8.4911 LearningRate 0.0661 Epoch: 7 Global Step: 42590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:36,224-Speed 11075.38 samples/sec Loss 8.3794 LearningRate 0.0660 Epoch: 7 Global Step: 42600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:37,146-Speed 11122.56 samples/sec Loss 8.2991 LearningRate 0.0660 Epoch: 7 Global Step: 42610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:38,034-Speed 11539.73 samples/sec Loss 8.5365 LearningRate 0.0660 Epoch: 7 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:38,973-Speed 10914.18 samples/sec Loss 8.5044 LearningRate 0.0660 Epoch: 7 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:39,880-Speed 11298.32 samples/sec Loss 8.4081 LearningRate 0.0660 Epoch: 7 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:40,786-Speed 11303.07 samples/sec Loss 8.5079 LearningRate 0.0660 Epoch: 7 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:41,700-Speed 11213.03 samples/sec Loss 8.4089 LearningRate 0.0660 Epoch: 7 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:42,626-Speed 11063.32 samples/sec Loss 8.4926 LearningRate 0.0660 Epoch: 7 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:43,563-Speed 10941.91 samples/sec Loss 8.5732 LearningRate 0.0660 Epoch: 7 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:44,451-Speed 11543.00 samples/sec Loss 8.5068 LearningRate 0.0660 Epoch: 7 Global Step: 42690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:45,386-Speed 10955.96 samples/sec Loss 8.5505 LearningRate 0.0660 Epoch: 7 Global Step: 42700 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:46,288-Speed 11364.07 samples/sec Loss 8.3864 LearningRate 0.0660 Epoch: 7 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:47,234-Speed 10826.62 samples/sec Loss 8.5566 LearningRate 0.0660 Epoch: 7 Global Step: 42720 Fp16 Grad Scale: 262144 Required: 7 hours Training: 2022-04-11 00:45:48,147-Speed 11226.83 samples/sec Loss 8.4037 LearningRate 0.0660 Epoch: 7 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:49,093-Speed 10829.88 samples/sec Loss 8.3621 LearningRate 0.0659 Epoch: 7 Global Step: 42740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:49,999-Speed 11318.33 samples/sec Loss 8.4611 LearningRate 0.0659 Epoch: 7 Global Step: 42750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:50,963-Speed 10631.08 samples/sec Loss 8.4778 LearningRate 0.0659 Epoch: 7 Global Step: 42760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:51,893-Speed 11012.16 samples/sec Loss 8.2761 LearningRate 0.0659 Epoch: 7 Global Step: 42770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:52,861-Speed 10592.48 samples/sec Loss 8.4479 LearningRate 0.0659 Epoch: 7 Global Step: 42780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:53,797-Speed 10940.34 samples/sec Loss 8.4507 LearningRate 0.0659 Epoch: 7 Global Step: 42790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:54,749-Speed 10766.25 samples/sec Loss 8.4426 LearningRate 0.0659 Epoch: 7 Global Step: 42800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:55,669-Speed 11134.32 samples/sec Loss 8.5562 LearningRate 0.0659 Epoch: 7 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:56,593-Speed 11086.76 samples/sec Loss 8.4456 LearningRate 0.0659 Epoch: 7 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:57,528-Speed 10959.17 samples/sec Loss 8.3748 LearningRate 0.0659 Epoch: 7 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:45:58,419-Speed 11504.81 samples/sec Loss 8.4357 LearningRate 0.0659 Epoch: 7 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:45:59,341-Speed 11119.37 samples/sec Loss 8.2764 LearningRate 0.0659 Epoch: 7 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:00,276-Speed 10957.82 samples/sec Loss 8.5798 LearningRate 0.0659 Epoch: 7 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:01,198-Speed 11105.80 samples/sec Loss 8.4196 LearningRate 0.0659 Epoch: 7 Global Step: 42870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:02,095-Speed 11431.59 samples/sec Loss 8.3819 LearningRate 0.0658 Epoch: 7 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:03,008-Speed 11220.49 samples/sec Loss 8.6112 LearningRate 0.0658 Epoch: 7 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:03,956-Speed 10806.94 samples/sec Loss 8.3837 LearningRate 0.0658 Epoch: 7 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:04,858-Speed 11366.66 samples/sec Loss 8.5270 LearningRate 0.0658 Epoch: 7 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:05,768-Speed 11255.37 samples/sec Loss 8.3874 LearningRate 0.0658 Epoch: 7 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:06,685-Speed 11173.45 samples/sec Loss 8.5099 LearningRate 0.0658 Epoch: 7 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:07,574-Speed 11528.13 samples/sec Loss 8.3412 LearningRate 0.0658 Epoch: 7 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:08,507-Speed 10986.45 samples/sec Loss 8.4102 LearningRate 0.0658 Epoch: 7 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:09,407-Speed 11381.86 samples/sec Loss 8.4337 LearningRate 0.0658 Epoch: 7 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:10,335-Speed 11047.89 samples/sec Loss 8.4001 LearningRate 0.0658 Epoch: 7 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:46:11,257-Speed 11106.47 samples/sec Loss 8.2869 LearningRate 0.0658 Epoch: 7 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:12,184-Speed 11057.06 samples/sec Loss 8.4217 LearningRate 0.0658 Epoch: 7 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:13,118-Speed 10968.07 samples/sec Loss 8.5374 LearningRate 0.0658 Epoch: 7 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:14,071-Speed 10755.98 samples/sec Loss 8.3744 LearningRate 0.0658 Epoch: 7 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:15,008-Speed 10938.00 samples/sec Loss 8.4197 LearningRate 0.0657 Epoch: 7 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 00:46:15,934-Speed 11065.18 samples/sec Loss 8.4608 LearningRate 0.0657 Epoch: 7 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:16,855-Speed 11132.89 samples/sec Loss 8.4656 LearningRate 0.0657 Epoch: 7 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:17,798-Speed 10860.64 samples/sec Loss 8.4763 LearningRate 0.0657 Epoch: 7 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:18,715-Speed 11169.73 samples/sec Loss 8.5288 LearningRate 0.0657 Epoch: 7 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:19,622-Speed 11298.66 samples/sec Loss 8.4337 LearningRate 0.0657 Epoch: 7 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:20,546-Speed 11095.35 samples/sec Loss 8.5263 LearningRate 0.0657 Epoch: 7 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:21,481-Speed 10955.72 samples/sec Loss 8.3719 LearningRate 0.0657 Epoch: 7 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:22,397-Speed 11185.03 samples/sec Loss 8.3796 LearningRate 0.0657 Epoch: 7 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:23,326-Speed 11037.91 samples/sec Loss 8.3770 LearningRate 0.0657 Epoch: 7 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:24,237-Speed 11246.30 samples/sec Loss 8.5250 LearningRate 0.0657 Epoch: 7 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:25,173-Speed 10944.32 samples/sec Loss 8.3231 LearningRate 0.0657 Epoch: 7 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:26,077-Speed 11339.89 samples/sec Loss 8.5531 LearningRate 0.0657 Epoch: 7 Global Step: 43140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:26,991-Speed 11212.34 samples/sec Loss 8.3724 LearningRate 0.0657 Epoch: 7 Global Step: 43150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:27,887-Speed 11435.14 samples/sec Loss 8.6179 LearningRate 0.0656 Epoch: 7 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:28,798-Speed 11241.39 samples/sec Loss 8.4386 LearningRate 0.0656 Epoch: 7 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:29,728-Speed 11023.24 samples/sec Loss 8.4398 LearningRate 0.0656 Epoch: 7 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:30,625-Speed 11417.63 samples/sec Loss 8.4978 LearningRate 0.0656 Epoch: 7 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:31,525-Speed 11392.30 samples/sec Loss 8.5051 LearningRate 0.0656 Epoch: 7 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:32,449-Speed 11082.72 samples/sec Loss 8.3898 LearningRate 0.0656 Epoch: 7 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:33,343-Speed 11472.63 samples/sec Loss 8.3870 LearningRate 0.0656 Epoch: 7 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:34,254-Speed 11246.58 samples/sec Loss 8.6692 LearningRate 0.0656 Epoch: 7 Global Step: 43230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:35,165-Speed 11241.71 samples/sec Loss 8.4661 LearningRate 0.0656 Epoch: 7 Global Step: 43240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:36,088-Speed 11102.63 samples/sec Loss 8.5047 LearningRate 0.0656 Epoch: 7 Global Step: 43250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:36,978-Speed 11510.53 samples/sec Loss 8.3920 LearningRate 0.0656 Epoch: 7 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:37,901-Speed 11108.16 samples/sec Loss 8.5093 LearningRate 0.0656 Epoch: 7 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:38,840-Speed 10911.86 samples/sec Loss 8.4814 LearningRate 0.0656 Epoch: 7 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:39,789-Speed 10805.90 samples/sec Loss 8.6080 LearningRate 0.0656 Epoch: 7 Global Step: 43290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:40,694-Speed 11312.61 samples/sec Loss 8.3501 LearningRate 0.0655 Epoch: 7 Global Step: 43300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:41,641-Speed 10824.41 samples/sec Loss 8.4443 LearningRate 0.0655 Epoch: 7 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:42,559-Speed 11169.28 samples/sec Loss 8.4296 LearningRate 0.0655 Epoch: 7 Global Step: 43320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:43,494-Speed 10956.71 samples/sec Loss 8.3505 LearningRate 0.0655 Epoch: 7 Global Step: 43330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:44,406-Speed 11234.71 samples/sec Loss 8.5215 LearningRate 0.0655 Epoch: 7 Global Step: 43340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:45,296-Speed 11512.98 samples/sec Loss 8.4636 LearningRate 0.0655 Epoch: 7 Global Step: 43350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:46,190-Speed 11470.52 samples/sec Loss 8.3447 LearningRate 0.0655 Epoch: 7 Global Step: 43360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:47,101-Speed 11241.00 samples/sec Loss 8.4048 LearningRate 0.0655 Epoch: 7 Global Step: 43370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:48,069-Speed 10585.63 samples/sec Loss 8.3645 LearningRate 0.0655 Epoch: 7 Global Step: 43380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:48,971-Speed 11364.97 samples/sec Loss 8.4253 LearningRate 0.0655 Epoch: 7 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:49,883-Speed 11228.01 samples/sec Loss 8.3980 LearningRate 0.0655 Epoch: 7 Global Step: 43400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:50,786-Speed 11352.56 samples/sec Loss 8.5021 LearningRate 0.0655 Epoch: 7 Global Step: 43410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:51,687-Speed 11377.95 samples/sec Loss 8.4681 LearningRate 0.0655 Epoch: 7 Global Step: 43420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:52,622-Speed 10961.24 samples/sec Loss 8.4102 LearningRate 0.0655 Epoch: 7 Global Step: 43430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:53,541-Speed 11141.99 samples/sec Loss 8.2859 LearningRate 0.0654 Epoch: 7 Global Step: 43440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:54,459-Speed 11172.04 samples/sec Loss 8.3955 LearningRate 0.0654 Epoch: 7 Global Step: 43450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:55,357-Speed 11402.14 samples/sec Loss 8.3068 LearningRate 0.0654 Epoch: 7 Global Step: 43460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:56,264-Speed 11299.53 samples/sec Loss 8.3794 LearningRate 0.0654 Epoch: 7 Global Step: 43470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:57,149-Speed 11578.23 samples/sec Loss 8.3344 LearningRate 0.0654 Epoch: 7 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:58,057-Speed 11289.71 samples/sec Loss 8.3715 LearningRate 0.0654 Epoch: 7 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:46:58,948-Speed 11507.49 samples/sec Loss 8.3464 LearningRate 0.0654 Epoch: 7 Global Step: 43500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:46:59,863-Speed 11199.43 samples/sec Loss 8.5274 LearningRate 0.0654 Epoch: 7 Global Step: 43510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:00,773-Speed 11251.25 samples/sec Loss 8.3302 LearningRate 0.0654 Epoch: 7 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:01,689-Speed 11194.88 samples/sec Loss 8.3341 LearningRate 0.0654 Epoch: 7 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:02,650-Speed 10658.52 samples/sec Loss 8.4498 LearningRate 0.0654 Epoch: 7 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:03,595-Speed 10843.39 samples/sec Loss 8.5098 LearningRate 0.0654 Epoch: 7 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:04,515-Speed 11145.43 samples/sec Loss 8.4777 LearningRate 0.0654 Epoch: 7 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:05,448-Speed 10975.96 samples/sec Loss 8.5473 LearningRate 0.0654 Epoch: 7 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:06,413-Speed 10614.30 samples/sec Loss 8.4555 LearningRate 0.0653 Epoch: 7 Global Step: 43580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:07,364-Speed 10782.34 samples/sec Loss 8.1894 LearningRate 0.0653 Epoch: 7 Global Step: 43590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:08,263-Speed 11394.62 samples/sec Loss 8.4914 LearningRate 0.0653 Epoch: 7 Global Step: 43600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:09,171-Speed 11289.79 samples/sec Loss 8.3934 LearningRate 0.0653 Epoch: 7 Global Step: 43610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:10,085-Speed 11201.39 samples/sec Loss 8.4406 LearningRate 0.0653 Epoch: 7 Global Step: 43620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:11,037-Speed 10768.20 samples/sec Loss 8.3817 LearningRate 0.0653 Epoch: 7 Global Step: 43630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:11,924-Speed 11552.21 samples/sec Loss 8.4221 LearningRate 0.0653 Epoch: 7 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:12,821-Speed 11418.78 samples/sec Loss 8.4176 LearningRate 0.0653 Epoch: 7 Global Step: 43650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:13,737-Speed 11192.81 samples/sec Loss 8.4429 LearningRate 0.0653 Epoch: 7 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:14,659-Speed 11109.46 samples/sec Loss 8.3002 LearningRate 0.0653 Epoch: 7 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:15,593-Speed 10969.95 samples/sec Loss 8.4063 LearningRate 0.0653 Epoch: 7 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:16,475-Speed 11625.86 samples/sec Loss 8.3435 LearningRate 0.0653 Epoch: 7 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:17,365-Speed 11514.97 samples/sec Loss 8.5542 LearningRate 0.0653 Epoch: 7 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:18,265-Speed 11381.56 samples/sec Loss 8.4175 LearningRate 0.0653 Epoch: 7 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:19,151-Speed 11571.56 samples/sec Loss 8.5491 LearningRate 0.0652 Epoch: 7 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:20,070-Speed 11145.76 samples/sec Loss 8.3834 LearningRate 0.0652 Epoch: 7 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:20,999-Speed 11033.03 samples/sec Loss 8.2972 LearningRate 0.0652 Epoch: 7 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:21,905-Speed 11308.14 samples/sec Loss 8.3240 LearningRate 0.0652 Epoch: 7 Global Step: 43750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:22,790-Speed 11575.08 samples/sec Loss 8.2654 LearningRate 0.0652 Epoch: 7 Global Step: 43760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:23,677-Speed 11554.39 samples/sec Loss 8.5004 LearningRate 0.0652 Epoch: 7 Global Step: 43770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:24,579-Speed 11367.50 samples/sec Loss 8.4772 LearningRate 0.0652 Epoch: 7 Global Step: 43780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:25,507-Speed 11040.60 samples/sec Loss 8.4394 LearningRate 0.0652 Epoch: 7 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:26,422-Speed 11191.34 samples/sec Loss 8.6044 LearningRate 0.0652 Epoch: 7 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:27,322-Speed 11388.66 samples/sec Loss 8.4874 LearningRate 0.0652 Epoch: 7 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:28,263-Speed 10894.32 samples/sec Loss 8.3439 LearningRate 0.0652 Epoch: 7 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:29,182-Speed 11147.51 samples/sec Loss 8.3895 LearningRate 0.0652 Epoch: 7 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:30,099-Speed 11169.87 samples/sec Loss 8.2651 LearningRate 0.0652 Epoch: 7 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:31,046-Speed 10817.81 samples/sec Loss 8.3480 LearningRate 0.0652 Epoch: 7 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:32,005-Speed 10682.03 samples/sec Loss 8.4263 LearningRate 0.0652 Epoch: 7 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:32,928-Speed 11107.11 samples/sec Loss 8.4452 LearningRate 0.0651 Epoch: 7 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:33,874-Speed 10833.39 samples/sec Loss 8.4338 LearningRate 0.0651 Epoch: 7 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:34,834-Speed 10678.56 samples/sec Loss 8.4429 LearningRate 0.0651 Epoch: 7 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:35,721-Speed 11554.03 samples/sec Loss 8.3103 LearningRate 0.0651 Epoch: 7 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:36,642-Speed 11125.39 samples/sec Loss 8.3183 LearningRate 0.0651 Epoch: 7 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:37,542-Speed 11386.73 samples/sec Loss 8.4772 LearningRate 0.0651 Epoch: 7 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:38,464-Speed 11112.31 samples/sec Loss 8.3430 LearningRate 0.0651 Epoch: 7 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:39,382-Speed 11157.16 samples/sec Loss 8.5033 LearningRate 0.0651 Epoch: 7 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:40,330-Speed 10810.39 samples/sec Loss 8.4186 LearningRate 0.0651 Epoch: 7 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:41,275-Speed 10846.90 samples/sec Loss 8.4571 LearningRate 0.0651 Epoch: 7 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:47:42,218-Speed 10861.81 samples/sec Loss 8.5252 LearningRate 0.0651 Epoch: 7 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:43,118-Speed 11391.60 samples/sec Loss 8.3902 LearningRate 0.0651 Epoch: 7 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:44,045-Speed 11057.10 samples/sec Loss 8.4309 LearningRate 0.0651 Epoch: 7 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:47:44,927-Speed 11614.31 samples/sec Loss 8.4338 LearningRate 0.0651 Epoch: 7 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:48:06,182-[lfw][44000]XNorm: 13.976843 Training: 2022-04-11 00:48:06,182-[lfw][44000]Accuracy-Flip: 0.99350+-0.00418 Training: 2022-04-11 00:48:06,183-[lfw][44000]Accuracy-Highest: 0.99467 Training: 2022-04-11 00:48:30,746-[cfp_fp][44000]XNorm: 11.592394 Training: 2022-04-11 00:48:30,747-[cfp_fp][44000]Accuracy-Flip: 0.93071+-0.01170 Training: 2022-04-11 00:48:30,747-[cfp_fp][44000]Accuracy-Highest: 0.93200 Training: 2022-04-11 00:48:51,890-[agedb_30][44000]XNorm: 13.622644 Training: 2022-04-11 00:48:51,891-[agedb_30][44000]Accuracy-Flip: 0.95400+-0.00768 Training: 2022-04-11 00:48:51,891-[agedb_30][44000]Accuracy-Highest: 0.95517 Training: 2022-04-11 00:48:52,804-Speed 150.86 samples/sec Loss 8.6259 LearningRate 0.0650 Epoch: 7 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:48:53,691-Speed 11554.37 samples/sec Loss 8.3950 LearningRate 0.0650 Epoch: 7 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:48:54,585-Speed 11461.84 samples/sec Loss 8.4794 LearningRate 0.0650 Epoch: 7 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:48:55,465-Speed 11648.07 samples/sec Loss 8.4330 LearningRate 0.0650 Epoch: 7 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 00:48:56,376-Speed 11239.10 samples/sec Loss 8.5622 LearningRate 0.0650 Epoch: 7 Global Step: 44050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:48:57,296-Speed 11145.12 samples/sec Loss 8.4914 LearningRate 0.0650 Epoch: 7 Global Step: 44060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:48:58,232-Speed 10946.04 samples/sec Loss 8.3497 LearningRate 0.0650 Epoch: 7 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:48:59,113-Speed 11632.71 samples/sec Loss 8.2559 LearningRate 0.0650 Epoch: 7 Global Step: 44080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:00,007-Speed 11454.25 samples/sec Loss 8.4963 LearningRate 0.0650 Epoch: 7 Global Step: 44090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:00,918-Speed 11252.79 samples/sec Loss 8.3588 LearningRate 0.0650 Epoch: 7 Global Step: 44100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:01,844-Speed 11061.33 samples/sec Loss 8.4160 LearningRate 0.0650 Epoch: 7 Global Step: 44110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:02,785-Speed 10886.88 samples/sec Loss 8.3646 LearningRate 0.0650 Epoch: 7 Global Step: 44120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:03,678-Speed 11486.37 samples/sec Loss 8.5335 LearningRate 0.0650 Epoch: 7 Global Step: 44130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:04,617-Speed 10901.67 samples/sec Loss 8.4254 LearningRate 0.0650 Epoch: 7 Global Step: 44140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:05,517-Speed 11387.24 samples/sec Loss 8.4238 LearningRate 0.0649 Epoch: 7 Global Step: 44150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:06,414-Speed 11422.31 samples/sec Loss 8.4058 LearningRate 0.0649 Epoch: 7 Global Step: 44160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:07,324-Speed 11264.38 samples/sec Loss 8.4803 LearningRate 0.0649 Epoch: 7 Global Step: 44170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:08,230-Speed 11313.55 samples/sec Loss 8.5824 LearningRate 0.0649 Epoch: 7 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:09,126-Speed 11429.44 samples/sec Loss 8.2966 LearningRate 0.0649 Epoch: 7 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:10,008-Speed 11615.92 samples/sec Loss 8.3704 LearningRate 0.0649 Epoch: 7 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:10,932-Speed 11095.49 samples/sec Loss 8.3185 LearningRate 0.0649 Epoch: 7 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:11,828-Speed 11429.56 samples/sec Loss 8.2817 LearningRate 0.0649 Epoch: 7 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:12,782-Speed 10747.49 samples/sec Loss 8.3842 LearningRate 0.0649 Epoch: 7 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:13,699-Speed 11172.65 samples/sec Loss 8.2752 LearningRate 0.0649 Epoch: 7 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:14,611-Speed 11231.64 samples/sec Loss 8.3130 LearningRate 0.0649 Epoch: 7 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:15,560-Speed 10802.37 samples/sec Loss 8.3495 LearningRate 0.0649 Epoch: 7 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:16,463-Speed 11342.21 samples/sec Loss 8.3513 LearningRate 0.0649 Epoch: 7 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:17,363-Speed 11390.58 samples/sec Loss 8.4813 LearningRate 0.0649 Epoch: 7 Global Step: 44280 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:49:18,299-Speed 10953.30 samples/sec Loss 8.4930 LearningRate 0.0648 Epoch: 7 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:19,233-Speed 10959.51 samples/sec Loss 8.3437 LearningRate 0.0648 Epoch: 7 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:20,153-Speed 11144.55 samples/sec Loss 8.4829 LearningRate 0.0648 Epoch: 7 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:21,046-Speed 11479.80 samples/sec Loss 8.4592 LearningRate 0.0648 Epoch: 7 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:21,961-Speed 11202.78 samples/sec Loss 8.3177 LearningRate 0.0648 Epoch: 7 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:22,837-Speed 11688.50 samples/sec Loss 8.4102 LearningRate 0.0648 Epoch: 7 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:23,761-Speed 11099.09 samples/sec Loss 8.5489 LearningRate 0.0648 Epoch: 7 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:24,697-Speed 10944.35 samples/sec Loss 8.3861 LearningRate 0.0648 Epoch: 7 Global Step: 44360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:25,610-Speed 11226.64 samples/sec Loss 8.2684 LearningRate 0.0648 Epoch: 7 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:26,523-Speed 11221.56 samples/sec Loss 8.4851 LearningRate 0.0648 Epoch: 7 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:27,462-Speed 10912.53 samples/sec Loss 8.3995 LearningRate 0.0648 Epoch: 7 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:28,368-Speed 11309.07 samples/sec Loss 8.3348 LearningRate 0.0648 Epoch: 7 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:29,302-Speed 10966.14 samples/sec Loss 8.4223 LearningRate 0.0648 Epoch: 7 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:30,224-Speed 11124.14 samples/sec Loss 8.3933 LearningRate 0.0648 Epoch: 7 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:31,154-Speed 11009.41 samples/sec Loss 8.3873 LearningRate 0.0647 Epoch: 7 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:32,056-Speed 11369.20 samples/sec Loss 8.3400 LearningRate 0.0647 Epoch: 7 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:32,980-Speed 11091.39 samples/sec Loss 8.3915 LearningRate 0.0647 Epoch: 7 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:33,913-Speed 10980.36 samples/sec Loss 8.3792 LearningRate 0.0647 Epoch: 7 Global Step: 44460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:34,836-Speed 11106.95 samples/sec Loss 8.3268 LearningRate 0.0647 Epoch: 7 Global Step: 44470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:35,786-Speed 10786.63 samples/sec Loss 8.4299 LearningRate 0.0647 Epoch: 7 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:36,715-Speed 11029.97 samples/sec Loss 8.3659 LearningRate 0.0647 Epoch: 7 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:37,611-Speed 11424.08 samples/sec Loss 8.3695 LearningRate 0.0647 Epoch: 7 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:38,499-Speed 11540.20 samples/sec Loss 8.4442 LearningRate 0.0647 Epoch: 7 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:39,511-Speed 10125.41 samples/sec Loss 8.5137 LearningRate 0.0647 Epoch: 7 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:40,413-Speed 11363.44 samples/sec Loss 8.6176 LearningRate 0.0647 Epoch: 7 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:41,319-Speed 11314.73 samples/sec Loss 8.2943 LearningRate 0.0647 Epoch: 7 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:42,252-Speed 10979.86 samples/sec Loss 8.4468 LearningRate 0.0647 Epoch: 7 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:43,155-Speed 11347.18 samples/sec Loss 8.3816 LearningRate 0.0647 Epoch: 7 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:44,049-Speed 11467.02 samples/sec Loss 8.3319 LearningRate 0.0646 Epoch: 7 Global Step: 44570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:44,935-Speed 11563.65 samples/sec Loss 8.4773 LearningRate 0.0646 Epoch: 7 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:45,817-Speed 11627.33 samples/sec Loss 8.4504 LearningRate 0.0646 Epoch: 7 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:46,750-Speed 10979.59 samples/sec Loss 8.3145 LearningRate 0.0646 Epoch: 7 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:47,681-Speed 11005.78 samples/sec Loss 8.3022 LearningRate 0.0646 Epoch: 7 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:48,603-Speed 11111.94 samples/sec Loss 8.3240 LearningRate 0.0646 Epoch: 7 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:49,531-Speed 11035.36 samples/sec Loss 8.3552 LearningRate 0.0646 Epoch: 7 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:49:50,446-Speed 11204.99 samples/sec Loss 8.3407 LearningRate 0.0646 Epoch: 7 Global Step: 44640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:51,384-Speed 10921.86 samples/sec Loss 8.2592 LearningRate 0.0646 Epoch: 7 Global Step: 44650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:52,326-Speed 10876.16 samples/sec Loss 8.3865 LearningRate 0.0646 Epoch: 7 Global Step: 44660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:53,255-Speed 11041.57 samples/sec Loss 8.5173 LearningRate 0.0646 Epoch: 7 Global Step: 44670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:54,200-Speed 10839.20 samples/sec Loss 8.3463 LearningRate 0.0646 Epoch: 7 Global Step: 44680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:55,095-Speed 11448.06 samples/sec Loss 8.5335 LearningRate 0.0646 Epoch: 7 Global Step: 44690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:55,981-Speed 11563.57 samples/sec Loss 8.3741 LearningRate 0.0646 Epoch: 7 Global Step: 44700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:56,886-Speed 11320.06 samples/sec Loss 8.4783 LearningRate 0.0645 Epoch: 7 Global Step: 44710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:57,795-Speed 11277.21 samples/sec Loss 8.4263 LearningRate 0.0645 Epoch: 7 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:58,717-Speed 11111.88 samples/sec Loss 8.3230 LearningRate 0.0645 Epoch: 7 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:49:59,613-Speed 11433.36 samples/sec Loss 8.3087 LearningRate 0.0645 Epoch: 7 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:00,535-Speed 11110.22 samples/sec Loss 8.4979 LearningRate 0.0645 Epoch: 7 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:01,462-Speed 11060.50 samples/sec Loss 8.3645 LearningRate 0.0645 Epoch: 7 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:02,366-Speed 11338.04 samples/sec Loss 8.3483 LearningRate 0.0645 Epoch: 7 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:03,281-Speed 11201.63 samples/sec Loss 8.5038 LearningRate 0.0645 Epoch: 7 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:04,173-Speed 11479.71 samples/sec Loss 8.4735 LearningRate 0.0645 Epoch: 7 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:05,066-Speed 11484.38 samples/sec Loss 8.3107 LearningRate 0.0645 Epoch: 7 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:05,970-Speed 11327.06 samples/sec Loss 8.2599 LearningRate 0.0645 Epoch: 7 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:06,902-Speed 10996.05 samples/sec Loss 8.3833 LearningRate 0.0645 Epoch: 7 Global Step: 44820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:07,855-Speed 10756.39 samples/sec Loss 8.3305 LearningRate 0.0645 Epoch: 7 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:08,762-Speed 11297.40 samples/sec Loss 8.4202 LearningRate 0.0645 Epoch: 7 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:09,641-Speed 11654.18 samples/sec Loss 8.3725 LearningRate 0.0644 Epoch: 7 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:10,554-Speed 11228.57 samples/sec Loss 8.4350 LearningRate 0.0644 Epoch: 7 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:11,439-Speed 11579.67 samples/sec Loss 8.4473 LearningRate 0.0644 Epoch: 7 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:12,352-Speed 11219.06 samples/sec Loss 8.3395 LearningRate 0.0644 Epoch: 7 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:13,238-Speed 11569.21 samples/sec Loss 8.4235 LearningRate 0.0644 Epoch: 7 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:14,119-Speed 11632.38 samples/sec Loss 8.3420 LearningRate 0.0644 Epoch: 7 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:15,029-Speed 11256.52 samples/sec Loss 8.3911 LearningRate 0.0644 Epoch: 7 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:15,944-Speed 11200.39 samples/sec Loss 8.3474 LearningRate 0.0644 Epoch: 7 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:16,851-Speed 11298.44 samples/sec Loss 8.3024 LearningRate 0.0644 Epoch: 7 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:17,783-Speed 10992.01 samples/sec Loss 8.4382 LearningRate 0.0644 Epoch: 7 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:18,752-Speed 10575.93 samples/sec Loss 8.2973 LearningRate 0.0644 Epoch: 7 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:19,681-Speed 11027.32 samples/sec Loss 8.4179 LearningRate 0.0644 Epoch: 7 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:20,566-Speed 11581.84 samples/sec Loss 8.2759 LearningRate 0.0644 Epoch: 7 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:21,476-Speed 11261.73 samples/sec Loss 8.3788 LearningRate 0.0644 Epoch: 7 Global Step: 44980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:22,382-Speed 11314.17 samples/sec Loss 8.4159 LearningRate 0.0644 Epoch: 7 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:23,267-Speed 11575.51 samples/sec Loss 8.4516 LearningRate 0.0643 Epoch: 7 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:24,185-Speed 11158.94 samples/sec Loss 8.3006 LearningRate 0.0643 Epoch: 7 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:25,105-Speed 11146.03 samples/sec Loss 8.2264 LearningRate 0.0643 Epoch: 7 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:26,020-Speed 11192.46 samples/sec Loss 8.2747 LearningRate 0.0643 Epoch: 7 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:26,921-Speed 11379.81 samples/sec Loss 8.2702 LearningRate 0.0643 Epoch: 7 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:27,845-Speed 11084.94 samples/sec Loss 8.3745 LearningRate 0.0643 Epoch: 7 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:28,757-Speed 11258.04 samples/sec Loss 8.3234 LearningRate 0.0643 Epoch: 7 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:29,677-Speed 11133.40 samples/sec Loss 8.2996 LearningRate 0.0643 Epoch: 7 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:30,568-Speed 11503.17 samples/sec Loss 8.4995 LearningRate 0.0643 Epoch: 7 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:31,475-Speed 11296.78 samples/sec Loss 8.3387 LearningRate 0.0643 Epoch: 7 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:32,410-Speed 10953.00 samples/sec Loss 8.2258 LearningRate 0.0643 Epoch: 7 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:33,304-Speed 11468.40 samples/sec Loss 8.2627 LearningRate 0.0643 Epoch: 7 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:34,243-Speed 10917.61 samples/sec Loss 8.3524 LearningRate 0.0643 Epoch: 7 Global Step: 45120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:35,166-Speed 11097.80 samples/sec Loss 8.4296 LearningRate 0.0643 Epoch: 7 Global Step: 45130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:36,083-Speed 11171.94 samples/sec Loss 8.2875 LearningRate 0.0642 Epoch: 7 Global Step: 45140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:37,022-Speed 10912.90 samples/sec Loss 8.2715 LearningRate 0.0642 Epoch: 7 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:37,953-Speed 11010.80 samples/sec Loss 8.3374 LearningRate 0.0642 Epoch: 7 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:38,897-Speed 10847.31 samples/sec Loss 8.2761 LearningRate 0.0642 Epoch: 7 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:39,830-Speed 10986.14 samples/sec Loss 8.3030 LearningRate 0.0642 Epoch: 7 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:40,745-Speed 11204.66 samples/sec Loss 8.3800 LearningRate 0.0642 Epoch: 7 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:41,670-Speed 11074.75 samples/sec Loss 8.4230 LearningRate 0.0642 Epoch: 7 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:42,601-Speed 11004.25 samples/sec Loss 8.3631 LearningRate 0.0642 Epoch: 7 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:43,521-Speed 11132.49 samples/sec Loss 8.5802 LearningRate 0.0642 Epoch: 7 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:44,442-Speed 11158.34 samples/sec Loss 8.4174 LearningRate 0.0642 Epoch: 7 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:45,356-Speed 11219.93 samples/sec Loss 8.3392 LearningRate 0.0642 Epoch: 7 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:46,297-Speed 10880.72 samples/sec Loss 8.3970 LearningRate 0.0642 Epoch: 7 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:47,200-Speed 11351.77 samples/sec Loss 8.2695 LearningRate 0.0642 Epoch: 7 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:48,081-Speed 11636.40 samples/sec Loss 8.3209 LearningRate 0.0642 Epoch: 7 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:49,019-Speed 10922.64 samples/sec Loss 8.5286 LearningRate 0.0641 Epoch: 7 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:49,931-Speed 11241.97 samples/sec Loss 8.1447 LearningRate 0.0641 Epoch: 7 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:50,825-Speed 11459.38 samples/sec Loss 8.5019 LearningRate 0.0641 Epoch: 7 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:51,752-Speed 11059.93 samples/sec Loss 8.2513 LearningRate 0.0641 Epoch: 7 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:52,644-Speed 11481.67 samples/sec Loss 8.4309 LearningRate 0.0641 Epoch: 7 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:53,550-Speed 11313.42 samples/sec Loss 8.3349 LearningRate 0.0641 Epoch: 7 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:54,444-Speed 11457.65 samples/sec Loss 8.5771 LearningRate 0.0641 Epoch: 7 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:55,336-Speed 11488.09 samples/sec Loss 8.3235 LearningRate 0.0641 Epoch: 7 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:50:56,223-Speed 11558.63 samples/sec Loss 8.3260 LearningRate 0.0641 Epoch: 7 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:57,132-Speed 11274.59 samples/sec Loss 8.4417 LearningRate 0.0641 Epoch: 7 Global Step: 45370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:58,028-Speed 11434.17 samples/sec Loss 8.3370 LearningRate 0.0641 Epoch: 7 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:58,961-Speed 10983.16 samples/sec Loss 8.2004 LearningRate 0.0641 Epoch: 7 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:50:59,886-Speed 11073.32 samples/sec Loss 8.3437 LearningRate 0.0641 Epoch: 7 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:00,808-Speed 11114.81 samples/sec Loss 8.3326 LearningRate 0.0641 Epoch: 7 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:01,740-Speed 10994.52 samples/sec Loss 8.2988 LearningRate 0.0640 Epoch: 7 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:02,635-Speed 11445.30 samples/sec Loss 8.1800 LearningRate 0.0640 Epoch: 7 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:03,588-Speed 10750.25 samples/sec Loss 8.2662 LearningRate 0.0640 Epoch: 7 Global Step: 45440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:04,529-Speed 10897.11 samples/sec Loss 8.4275 LearningRate 0.0640 Epoch: 7 Global Step: 45450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:05,429-Speed 11379.60 samples/sec Loss 8.3529 LearningRate 0.0640 Epoch: 7 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:06,329-Speed 11388.06 samples/sec Loss 8.3158 LearningRate 0.0640 Epoch: 7 Global Step: 45470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:07,344-Speed 10092.52 samples/sec Loss 8.4634 LearningRate 0.0640 Epoch: 7 Global Step: 45480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:17,656-Speed 993.18 samples/sec Loss 8.3943 LearningRate 0.0640 Epoch: 8 Global Step: 45490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:18,582-Speed 11070.61 samples/sec Loss 7.6280 LearningRate 0.0640 Epoch: 8 Global Step: 45500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:19,571-Speed 10366.54 samples/sec Loss 7.5619 LearningRate 0.0640 Epoch: 8 Global Step: 45510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:20,584-Speed 10109.56 samples/sec Loss 7.5519 LearningRate 0.0640 Epoch: 8 Global Step: 45520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:21,537-Speed 10760.27 samples/sec Loss 7.6919 LearningRate 0.0640 Epoch: 8 Global Step: 45530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:22,458-Speed 11123.42 samples/sec Loss 7.5997 LearningRate 0.0640 Epoch: 8 Global Step: 45540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:23,511-Speed 9727.47 samples/sec Loss 7.6502 LearningRate 0.0640 Epoch: 8 Global Step: 45550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:24,492-Speed 10441.87 samples/sec Loss 7.8473 LearningRate 0.0639 Epoch: 8 Global Step: 45560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:25,432-Speed 10908.86 samples/sec Loss 7.6598 LearningRate 0.0639 Epoch: 8 Global Step: 45570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:26,389-Speed 10709.49 samples/sec Loss 7.6758 LearningRate 0.0639 Epoch: 8 Global Step: 45580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:27,374-Speed 10396.72 samples/sec Loss 7.6847 LearningRate 0.0639 Epoch: 8 Global Step: 45590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:28,351-Speed 10493.70 samples/sec Loss 7.7899 LearningRate 0.0639 Epoch: 8 Global Step: 45600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:29,308-Speed 10700.45 samples/sec Loss 7.7750 LearningRate 0.0639 Epoch: 8 Global Step: 45610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:30,214-Speed 11311.59 samples/sec Loss 7.7983 LearningRate 0.0639 Epoch: 8 Global Step: 45620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:31,182-Speed 10590.47 samples/sec Loss 7.6306 LearningRate 0.0639 Epoch: 8 Global Step: 45630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:32,079-Speed 11421.46 samples/sec Loss 7.6865 LearningRate 0.0639 Epoch: 8 Global Step: 45640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:32,988-Speed 11269.58 samples/sec Loss 7.8048 LearningRate 0.0639 Epoch: 8 Global Step: 45650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:33,902-Speed 11210.55 samples/sec Loss 7.8097 LearningRate 0.0639 Epoch: 8 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:34,821-Speed 11157.69 samples/sec Loss 7.7425 LearningRate 0.0639 Epoch: 8 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:35,729-Speed 11278.05 samples/sec Loss 7.7635 LearningRate 0.0639 Epoch: 8 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:36,661-Speed 10990.76 samples/sec Loss 7.7884 LearningRate 0.0639 Epoch: 8 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:37,603-Speed 10881.96 samples/sec Loss 7.7907 LearningRate 0.0639 Epoch: 8 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:38,565-Speed 10654.51 samples/sec Loss 7.8475 LearningRate 0.0638 Epoch: 8 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:39,507-Speed 10881.10 samples/sec Loss 7.9005 LearningRate 0.0638 Epoch: 8 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:40,458-Speed 10774.76 samples/sec Loss 7.7551 LearningRate 0.0638 Epoch: 8 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:41,441-Speed 10426.59 samples/sec Loss 7.7344 LearningRate 0.0638 Epoch: 8 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:42,403-Speed 10646.35 samples/sec Loss 7.7396 LearningRate 0.0638 Epoch: 8 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:43,346-Speed 10870.51 samples/sec Loss 7.6915 LearningRate 0.0638 Epoch: 8 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:44,262-Speed 11177.70 samples/sec Loss 7.7081 LearningRate 0.0638 Epoch: 8 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:45,185-Speed 11103.88 samples/sec Loss 7.8728 LearningRate 0.0638 Epoch: 8 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:46,116-Speed 11002.97 samples/sec Loss 7.8135 LearningRate 0.0638 Epoch: 8 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:47,012-Speed 11443.51 samples/sec Loss 7.7680 LearningRate 0.0638 Epoch: 8 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:47,920-Speed 11286.29 samples/sec Loss 7.7304 LearningRate 0.0638 Epoch: 8 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:48,805-Speed 11579.93 samples/sec Loss 7.6787 LearningRate 0.0638 Epoch: 8 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:49,812-Speed 10167.84 samples/sec Loss 7.9487 LearningRate 0.0638 Epoch: 8 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:50,755-Speed 10868.73 samples/sec Loss 7.7774 LearningRate 0.0638 Epoch: 8 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:51,749-Speed 10313.10 samples/sec Loss 8.0112 LearningRate 0.0637 Epoch: 8 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:52,774-Speed 9990.83 samples/sec Loss 7.9025 LearningRate 0.0637 Epoch: 8 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:53,727-Speed 10760.38 samples/sec Loss 7.8900 LearningRate 0.0637 Epoch: 8 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:54,759-Speed 9925.40 samples/sec Loss 7.9711 LearningRate 0.0637 Epoch: 8 Global Step: 45880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:55,681-Speed 11107.62 samples/sec Loss 7.8189 LearningRate 0.0637 Epoch: 8 Global Step: 45890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:56,637-Speed 10727.52 samples/sec Loss 7.9683 LearningRate 0.0637 Epoch: 8 Global Step: 45900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:57,561-Speed 11086.46 samples/sec Loss 7.9571 LearningRate 0.0637 Epoch: 8 Global Step: 45910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:51:58,486-Speed 11078.46 samples/sec Loss 7.9849 LearningRate 0.0637 Epoch: 8 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:51:59,415-Speed 11020.80 samples/sec Loss 7.9216 LearningRate 0.0637 Epoch: 8 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:52:00,337-Speed 11118.84 samples/sec Loss 7.9769 LearningRate 0.0637 Epoch: 8 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:52:01,266-Speed 11037.68 samples/sec Loss 7.8605 LearningRate 0.0637 Epoch: 8 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:52:02,208-Speed 10876.60 samples/sec Loss 8.0499 LearningRate 0.0637 Epoch: 8 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:52:03,137-Speed 11033.16 samples/sec Loss 7.9957 LearningRate 0.0637 Epoch: 8 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:52:04,065-Speed 11034.77 samples/sec Loss 7.8076 LearningRate 0.0637 Epoch: 8 Global Step: 45980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:52:04,967-Speed 11368.80 samples/sec Loss 8.0322 LearningRate 0.0636 Epoch: 8 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:52:05,892-Speed 11074.67 samples/sec Loss 7.9182 LearningRate 0.0636 Epoch: 8 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:52:27,251-[lfw][46000]XNorm: 13.527228 Training: 2022-04-11 00:52:27,252-[lfw][46000]Accuracy-Flip: 0.99350+-0.00345 Training: 2022-04-11 00:52:27,252-[lfw][46000]Accuracy-Highest: 0.99467 Training: 2022-04-11 00:52:51,754-[cfp_fp][46000]XNorm: 11.412165 Training: 2022-04-11 00:52:51,755-[cfp_fp][46000]Accuracy-Flip: 0.93271+-0.01341 Training: 2022-04-11 00:52:51,755-[cfp_fp][46000]Accuracy-Highest: 0.93271 Training: 2022-04-11 00:53:12,906-[agedb_30][46000]XNorm: 13.306128 Training: 2022-04-11 00:53:12,907-[agedb_30][46000]Accuracy-Flip: 0.95717+-0.00699 Training: 2022-04-11 00:53:12,907-[agedb_30][46000]Accuracy-Highest: 0.95717 Training: 2022-04-11 00:53:13,835-Speed 150.72 samples/sec Loss 8.0093 LearningRate 0.0636 Epoch: 8 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:14,742-Speed 11299.23 samples/sec Loss 8.0495 LearningRate 0.0636 Epoch: 8 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:15,635-Speed 11469.06 samples/sec Loss 7.7945 LearningRate 0.0636 Epoch: 8 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:16,529-Speed 11462.87 samples/sec Loss 7.8965 LearningRate 0.0636 Epoch: 8 Global Step: 46040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:17,427-Speed 11415.65 samples/sec Loss 7.9929 LearningRate 0.0636 Epoch: 8 Global Step: 46050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:18,358-Speed 11000.49 samples/sec Loss 7.9393 LearningRate 0.0636 Epoch: 8 Global Step: 46060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:19,302-Speed 10856.55 samples/sec Loss 7.9718 LearningRate 0.0636 Epoch: 8 Global Step: 46070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:20,194-Speed 11497.56 samples/sec Loss 7.9889 LearningRate 0.0636 Epoch: 8 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:21,107-Speed 11222.00 samples/sec Loss 7.9017 LearningRate 0.0636 Epoch: 8 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:21,989-Speed 11608.66 samples/sec Loss 7.9369 LearningRate 0.0636 Epoch: 8 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:22,917-Speed 11046.83 samples/sec Loss 8.0830 LearningRate 0.0636 Epoch: 8 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:23,855-Speed 10929.19 samples/sec Loss 8.0166 LearningRate 0.0636 Epoch: 8 Global Step: 46120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:24,757-Speed 11354.50 samples/sec Loss 7.7397 LearningRate 0.0635 Epoch: 8 Global Step: 46130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:25,698-Speed 10891.98 samples/sec Loss 8.0325 LearningRate 0.0635 Epoch: 8 Global Step: 46140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:26,611-Speed 11226.77 samples/sec Loss 7.9391 LearningRate 0.0635 Epoch: 8 Global Step: 46150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:27,537-Speed 11059.07 samples/sec Loss 8.0179 LearningRate 0.0635 Epoch: 8 Global Step: 46160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:28,511-Speed 10518.08 samples/sec Loss 7.9936 LearningRate 0.0635 Epoch: 8 Global Step: 46170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:29,472-Speed 10663.97 samples/sec Loss 7.8651 LearningRate 0.0635 Epoch: 8 Global Step: 46180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:30,390-Speed 11167.07 samples/sec Loss 7.8028 LearningRate 0.0635 Epoch: 8 Global Step: 46190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:31,291-Speed 11375.12 samples/sec Loss 7.9492 LearningRate 0.0635 Epoch: 8 Global Step: 46200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:32,208-Speed 11164.69 samples/sec Loss 7.9205 LearningRate 0.0635 Epoch: 8 Global Step: 46210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:33,117-Speed 11276.89 samples/sec Loss 8.0644 LearningRate 0.0635 Epoch: 8 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:34,005-Speed 11541.23 samples/sec Loss 8.0169 LearningRate 0.0635 Epoch: 8 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:34,929-Speed 11084.43 samples/sec Loss 8.1386 LearningRate 0.0635 Epoch: 8 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:35,870-Speed 10887.63 samples/sec Loss 8.1112 LearningRate 0.0635 Epoch: 8 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:36,767-Speed 11428.88 samples/sec Loss 8.1641 LearningRate 0.0635 Epoch: 8 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:37,718-Speed 10773.56 samples/sec Loss 7.9990 LearningRate 0.0635 Epoch: 8 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:38,692-Speed 10525.29 samples/sec Loss 8.0712 LearningRate 0.0634 Epoch: 8 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:39,653-Speed 10659.49 samples/sec Loss 8.0717 LearningRate 0.0634 Epoch: 8 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:40,613-Speed 10673.83 samples/sec Loss 8.1293 LearningRate 0.0634 Epoch: 8 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:41,507-Speed 11465.15 samples/sec Loss 8.1501 LearningRate 0.0634 Epoch: 8 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:42,460-Speed 10749.75 samples/sec Loss 7.9663 LearningRate 0.0634 Epoch: 8 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:43,397-Speed 10933.92 samples/sec Loss 7.8863 LearningRate 0.0634 Epoch: 8 Global Step: 46330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:44,296-Speed 11400.46 samples/sec Loss 8.0846 LearningRate 0.0634 Epoch: 8 Global Step: 46340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:45,196-Speed 11388.48 samples/sec Loss 8.0061 LearningRate 0.0634 Epoch: 8 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:46,096-Speed 11377.89 samples/sec Loss 8.0539 LearningRate 0.0634 Epoch: 8 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:47,013-Speed 11182.76 samples/sec Loss 7.8716 LearningRate 0.0634 Epoch: 8 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:47,911-Speed 11408.87 samples/sec Loss 8.1268 LearningRate 0.0634 Epoch: 8 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:48,800-Speed 11528.18 samples/sec Loss 7.9238 LearningRate 0.0634 Epoch: 8 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:49,706-Speed 11303.89 samples/sec Loss 8.0141 LearningRate 0.0634 Epoch: 8 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:50,611-Speed 11330.93 samples/sec Loss 8.0646 LearningRate 0.0634 Epoch: 8 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:51,552-Speed 10884.59 samples/sec Loss 8.0896 LearningRate 0.0633 Epoch: 8 Global Step: 46420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:52,524-Speed 10544.38 samples/sec Loss 8.0741 LearningRate 0.0633 Epoch: 8 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:53,470-Speed 10829.78 samples/sec Loss 8.0948 LearningRate 0.0633 Epoch: 8 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:54,443-Speed 10536.70 samples/sec Loss 8.1013 LearningRate 0.0633 Epoch: 8 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:55,432-Speed 10361.37 samples/sec Loss 7.9236 LearningRate 0.0633 Epoch: 8 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:56,360-Speed 11032.96 samples/sec Loss 8.0282 LearningRate 0.0633 Epoch: 8 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:57,235-Speed 11716.49 samples/sec Loss 8.0481 LearningRate 0.0633 Epoch: 8 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:58,124-Speed 11534.57 samples/sec Loss 8.1351 LearningRate 0.0633 Epoch: 8 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:53:59,021-Speed 11414.90 samples/sec Loss 8.0336 LearningRate 0.0633 Epoch: 8 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:53:59,943-Speed 11120.68 samples/sec Loss 7.9733 LearningRate 0.0633 Epoch: 8 Global Step: 46510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:00,854-Speed 11246.43 samples/sec Loss 8.0714 LearningRate 0.0633 Epoch: 8 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:01,771-Speed 11174.59 samples/sec Loss 8.0777 LearningRate 0.0633 Epoch: 8 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:02,686-Speed 11198.01 samples/sec Loss 8.1484 LearningRate 0.0633 Epoch: 8 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:03,640-Speed 10745.61 samples/sec Loss 8.1104 LearningRate 0.0633 Epoch: 8 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:04,540-Speed 11376.10 samples/sec Loss 8.0753 LearningRate 0.0632 Epoch: 8 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:05,443-Speed 11344.95 samples/sec Loss 8.1262 LearningRate 0.0632 Epoch: 8 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:06,369-Speed 11069.63 samples/sec Loss 8.0487 LearningRate 0.0632 Epoch: 8 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:07,293-Speed 11086.02 samples/sec Loss 8.0126 LearningRate 0.0632 Epoch: 8 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:08,216-Speed 11104.89 samples/sec Loss 8.2549 LearningRate 0.0632 Epoch: 8 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:09,119-Speed 11346.97 samples/sec Loss 8.2016 LearningRate 0.0632 Epoch: 8 Global Step: 46610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:09,997-Speed 11673.24 samples/sec Loss 8.0396 LearningRate 0.0632 Epoch: 8 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:10,891-Speed 11454.57 samples/sec Loss 8.2571 LearningRate 0.0632 Epoch: 8 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:11,789-Speed 11412.42 samples/sec Loss 8.0251 LearningRate 0.0632 Epoch: 8 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:12,727-Speed 10924.72 samples/sec Loss 8.0486 LearningRate 0.0632 Epoch: 8 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:13,659-Speed 10997.84 samples/sec Loss 8.0822 LearningRate 0.0632 Epoch: 8 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:14,586-Speed 11050.23 samples/sec Loss 7.9331 LearningRate 0.0632 Epoch: 8 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:15,538-Speed 10767.28 samples/sec Loss 8.0234 LearningRate 0.0632 Epoch: 8 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:16,430-Speed 11489.40 samples/sec Loss 8.0489 LearningRate 0.0632 Epoch: 8 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:17,345-Speed 11202.97 samples/sec Loss 8.1720 LearningRate 0.0632 Epoch: 8 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:18,264-Speed 11144.01 samples/sec Loss 7.8992 LearningRate 0.0631 Epoch: 8 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:19,188-Speed 11099.48 samples/sec Loss 8.1203 LearningRate 0.0631 Epoch: 8 Global Step: 46720 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:54:20,137-Speed 10792.92 samples/sec Loss 8.2743 LearningRate 0.0631 Epoch: 8 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:21,028-Speed 11497.73 samples/sec Loss 8.0380 LearningRate 0.0631 Epoch: 8 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:21,935-Speed 11303.51 samples/sec Loss 8.1779 LearningRate 0.0631 Epoch: 8 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:22,828-Speed 11470.61 samples/sec Loss 8.1455 LearningRate 0.0631 Epoch: 8 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:23,766-Speed 10929.04 samples/sec Loss 8.0861 LearningRate 0.0631 Epoch: 8 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:24,685-Speed 11144.17 samples/sec Loss 8.1415 LearningRate 0.0631 Epoch: 8 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:25,598-Speed 11229.51 samples/sec Loss 8.1941 LearningRate 0.0631 Epoch: 8 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:26,557-Speed 10687.73 samples/sec Loss 8.1980 LearningRate 0.0631 Epoch: 8 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:27,485-Speed 11037.78 samples/sec Loss 7.9833 LearningRate 0.0631 Epoch: 8 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:28,386-Speed 11370.53 samples/sec Loss 8.1664 LearningRate 0.0631 Epoch: 8 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:29,298-Speed 11240.30 samples/sec Loss 8.0409 LearningRate 0.0631 Epoch: 8 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:30,198-Speed 11378.30 samples/sec Loss 8.2390 LearningRate 0.0631 Epoch: 8 Global Step: 46840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:31,154-Speed 10735.19 samples/sec Loss 8.1030 LearningRate 0.0630 Epoch: 8 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:32,080-Speed 11060.71 samples/sec Loss 8.2683 LearningRate 0.0630 Epoch: 8 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:32,962-Speed 11624.28 samples/sec Loss 8.1033 LearningRate 0.0630 Epoch: 8 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:33,852-Speed 11511.43 samples/sec Loss 8.2118 LearningRate 0.0630 Epoch: 8 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:34,798-Speed 10834.88 samples/sec Loss 8.0724 LearningRate 0.0630 Epoch: 8 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:35,711-Speed 11220.32 samples/sec Loss 8.0859 LearningRate 0.0630 Epoch: 8 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:36,652-Speed 10889.42 samples/sec Loss 8.0397 LearningRate 0.0630 Epoch: 8 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:37,635-Speed 10427.88 samples/sec Loss 8.0734 LearningRate 0.0630 Epoch: 8 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:38,594-Speed 10682.81 samples/sec Loss 8.1182 LearningRate 0.0630 Epoch: 8 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:39,563-Speed 10576.53 samples/sec Loss 8.1341 LearningRate 0.0630 Epoch: 8 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:40,538-Speed 10507.80 samples/sec Loss 8.0872 LearningRate 0.0630 Epoch: 8 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:41,469-Speed 11009.21 samples/sec Loss 8.0561 LearningRate 0.0630 Epoch: 8 Global Step: 46960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:42,393-Speed 11095.36 samples/sec Loss 8.0082 LearningRate 0.0630 Epoch: 8 Global Step: 46970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:43,328-Speed 10954.21 samples/sec Loss 8.1857 LearningRate 0.0630 Epoch: 8 Global Step: 46980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:44,282-Speed 10740.69 samples/sec Loss 8.2313 LearningRate 0.0629 Epoch: 8 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:45,175-Speed 11478.67 samples/sec Loss 8.2271 LearningRate 0.0629 Epoch: 8 Global Step: 47000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:46,083-Speed 11280.81 samples/sec Loss 8.2076 LearningRate 0.0629 Epoch: 8 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:47,022-Speed 10908.62 samples/sec Loss 8.1508 LearningRate 0.0629 Epoch: 8 Global Step: 47020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:47,911-Speed 11537.84 samples/sec Loss 8.2980 LearningRate 0.0629 Epoch: 8 Global Step: 47030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:48,815-Speed 11329.86 samples/sec Loss 8.2665 LearningRate 0.0629 Epoch: 8 Global Step: 47040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:49,729-Speed 11213.72 samples/sec Loss 8.1729 LearningRate 0.0629 Epoch: 8 Global Step: 47050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:54:50,618-Speed 11518.13 samples/sec Loss 8.0988 LearningRate 0.0629 Epoch: 8 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:51,550-Speed 11001.33 samples/sec Loss 8.0757 LearningRate 0.0629 Epoch: 8 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:52,466-Speed 11180.03 samples/sec Loss 8.3250 LearningRate 0.0629 Epoch: 8 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:53,413-Speed 10828.01 samples/sec Loss 8.2328 LearningRate 0.0629 Epoch: 8 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:54,381-Speed 10585.85 samples/sec Loss 8.0646 LearningRate 0.0629 Epoch: 8 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:55,323-Speed 10871.65 samples/sec Loss 8.3488 LearningRate 0.0629 Epoch: 8 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:56,245-Speed 11113.73 samples/sec Loss 8.2244 LearningRate 0.0629 Epoch: 8 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:57,167-Speed 11119.54 samples/sec Loss 8.2342 LearningRate 0.0629 Epoch: 8 Global Step: 47130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:58,085-Speed 11168.29 samples/sec Loss 8.1733 LearningRate 0.0628 Epoch: 8 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:58,990-Speed 11318.38 samples/sec Loss 8.0572 LearningRate 0.0628 Epoch: 8 Global Step: 47150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:54:59,912-Speed 11113.91 samples/sec Loss 8.1981 LearningRate 0.0628 Epoch: 8 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:00,824-Speed 11235.52 samples/sec Loss 8.1408 LearningRate 0.0628 Epoch: 8 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:01,770-Speed 10832.42 samples/sec Loss 8.0944 LearningRate 0.0628 Epoch: 8 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:02,732-Speed 10649.18 samples/sec Loss 8.1620 LearningRate 0.0628 Epoch: 8 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:03,644-Speed 11243.11 samples/sec Loss 8.2350 LearningRate 0.0628 Epoch: 8 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:04,589-Speed 10835.81 samples/sec Loss 8.2307 LearningRate 0.0628 Epoch: 8 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:05,491-Speed 11361.06 samples/sec Loss 8.0976 LearningRate 0.0628 Epoch: 8 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:06,424-Speed 10986.21 samples/sec Loss 8.1400 LearningRate 0.0628 Epoch: 8 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:07,319-Speed 11455.88 samples/sec Loss 8.2093 LearningRate 0.0628 Epoch: 8 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:08,249-Speed 11013.75 samples/sec Loss 8.2127 LearningRate 0.0628 Epoch: 8 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:09,128-Speed 11651.92 samples/sec Loss 8.1875 LearningRate 0.0628 Epoch: 8 Global Step: 47260 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:55:10,024-Speed 11438.86 samples/sec Loss 8.0353 LearningRate 0.0628 Epoch: 8 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:10,931-Speed 11296.07 samples/sec Loss 8.1608 LearningRate 0.0627 Epoch: 8 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:11,828-Speed 11420.50 samples/sec Loss 8.2811 LearningRate 0.0627 Epoch: 8 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:12,745-Speed 11179.93 samples/sec Loss 8.2968 LearningRate 0.0627 Epoch: 8 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:13,694-Speed 10795.57 samples/sec Loss 8.2160 LearningRate 0.0627 Epoch: 8 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:14,587-Speed 11486.14 samples/sec Loss 8.4146 LearningRate 0.0627 Epoch: 8 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:15,498-Speed 11239.45 samples/sec Loss 8.1423 LearningRate 0.0627 Epoch: 8 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:16,397-Speed 11405.33 samples/sec Loss 8.1598 LearningRate 0.0627 Epoch: 8 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:17,314-Speed 11165.52 samples/sec Loss 8.1150 LearningRate 0.0627 Epoch: 8 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:18,254-Speed 10901.51 samples/sec Loss 8.2264 LearningRate 0.0627 Epoch: 8 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:19,150-Speed 11442.90 samples/sec Loss 8.1467 LearningRate 0.0627 Epoch: 8 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:20,062-Speed 11238.91 samples/sec Loss 8.1298 LearningRate 0.0627 Epoch: 8 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:20,966-Speed 11332.75 samples/sec Loss 8.2304 LearningRate 0.0627 Epoch: 8 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:21,881-Speed 11192.39 samples/sec Loss 8.2102 LearningRate 0.0627 Epoch: 8 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:22,785-Speed 11342.88 samples/sec Loss 8.0941 LearningRate 0.0627 Epoch: 8 Global Step: 47410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:23,672-Speed 11544.39 samples/sec Loss 8.0808 LearningRate 0.0626 Epoch: 8 Global Step: 47420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:24,621-Speed 10797.68 samples/sec Loss 8.1507 LearningRate 0.0626 Epoch: 8 Global Step: 47430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:25,549-Speed 11048.75 samples/sec Loss 8.1911 LearningRate 0.0626 Epoch: 8 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:26,499-Speed 10786.60 samples/sec Loss 8.1179 LearningRate 0.0626 Epoch: 8 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:27,444-Speed 10845.11 samples/sec Loss 8.2116 LearningRate 0.0626 Epoch: 8 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:28,417-Speed 10533.75 samples/sec Loss 8.1484 LearningRate 0.0626 Epoch: 8 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:29,381-Speed 10641.02 samples/sec Loss 8.2181 LearningRate 0.0626 Epoch: 8 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:30,298-Speed 11178.46 samples/sec Loss 8.2841 LearningRate 0.0626 Epoch: 8 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:31,211-Speed 11233.50 samples/sec Loss 8.1574 LearningRate 0.0626 Epoch: 8 Global Step: 47500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:32,157-Speed 10831.57 samples/sec Loss 8.1742 LearningRate 0.0626 Epoch: 8 Global Step: 47510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:33,077-Speed 11136.42 samples/sec Loss 8.2469 LearningRate 0.0626 Epoch: 8 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:33,963-Speed 11559.56 samples/sec Loss 8.2068 LearningRate 0.0626 Epoch: 8 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:34,877-Speed 11210.02 samples/sec Loss 8.1756 LearningRate 0.0626 Epoch: 8 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:35,788-Speed 11248.30 samples/sec Loss 8.1567 LearningRate 0.0626 Epoch: 8 Global Step: 47550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:36,739-Speed 10778.89 samples/sec Loss 8.3908 LearningRate 0.0626 Epoch: 8 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:37,640-Speed 11368.65 samples/sec Loss 8.2236 LearningRate 0.0625 Epoch: 8 Global Step: 47570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:55:38,576-Speed 10946.53 samples/sec Loss 8.1524 LearningRate 0.0625 Epoch: 8 Global Step: 47580 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:55:39,483-Speed 11305.13 samples/sec Loss 8.2174 LearningRate 0.0625 Epoch: 8 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:40,390-Speed 11292.19 samples/sec Loss 8.1373 LearningRate 0.0625 Epoch: 8 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:41,330-Speed 10907.98 samples/sec Loss 8.2159 LearningRate 0.0625 Epoch: 8 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:42,242-Speed 11228.32 samples/sec Loss 8.1838 LearningRate 0.0625 Epoch: 8 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:43,181-Speed 10913.02 samples/sec Loss 8.3916 LearningRate 0.0625 Epoch: 8 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:44,133-Speed 10767.35 samples/sec Loss 8.2824 LearningRate 0.0625 Epoch: 8 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:45,021-Speed 11530.05 samples/sec Loss 8.2443 LearningRate 0.0625 Epoch: 8 Global Step: 47650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:45,898-Speed 11689.56 samples/sec Loss 8.1721 LearningRate 0.0625 Epoch: 8 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:46,825-Speed 11055.26 samples/sec Loss 8.1767 LearningRate 0.0625 Epoch: 8 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:47,747-Speed 11115.59 samples/sec Loss 8.2130 LearningRate 0.0625 Epoch: 8 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:48,706-Speed 10689.02 samples/sec Loss 8.1031 LearningRate 0.0625 Epoch: 8 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:49,662-Speed 10718.82 samples/sec Loss 8.2102 LearningRate 0.0625 Epoch: 8 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:50,580-Speed 11167.22 samples/sec Loss 8.3411 LearningRate 0.0624 Epoch: 8 Global Step: 47710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:51,499-Speed 11147.04 samples/sec Loss 8.1133 LearningRate 0.0624 Epoch: 8 Global Step: 47720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:52,403-Speed 11337.79 samples/sec Loss 8.1520 LearningRate 0.0624 Epoch: 8 Global Step: 47730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:53,298-Speed 11439.99 samples/sec Loss 8.2163 LearningRate 0.0624 Epoch: 8 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:54,242-Speed 10864.90 samples/sec Loss 8.3072 LearningRate 0.0624 Epoch: 8 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:55:55,182-Speed 10901.69 samples/sec Loss 8.1117 LearningRate 0.0624 Epoch: 8 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:56,143-Speed 10659.99 samples/sec Loss 8.1632 LearningRate 0.0624 Epoch: 8 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:57,030-Speed 11552.16 samples/sec Loss 8.2982 LearningRate 0.0624 Epoch: 8 Global Step: 47780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:57,944-Speed 11210.03 samples/sec Loss 8.2099 LearningRate 0.0624 Epoch: 8 Global Step: 47790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:58,863-Speed 11149.54 samples/sec Loss 8.1851 LearningRate 0.0624 Epoch: 8 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:55:59,781-Speed 11165.08 samples/sec Loss 8.2669 LearningRate 0.0624 Epoch: 8 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:00,702-Speed 11132.54 samples/sec Loss 8.2363 LearningRate 0.0624 Epoch: 8 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:01,634-Speed 10995.49 samples/sec Loss 8.2080 LearningRate 0.0624 Epoch: 8 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:02,525-Speed 11495.78 samples/sec Loss 8.2768 LearningRate 0.0624 Epoch: 8 Global Step: 47840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:03,465-Speed 10902.75 samples/sec Loss 8.2485 LearningRate 0.0623 Epoch: 8 Global Step: 47850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:04,371-Speed 11301.41 samples/sec Loss 8.2084 LearningRate 0.0623 Epoch: 8 Global Step: 47860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:05,268-Speed 11425.17 samples/sec Loss 8.1782 LearningRate 0.0623 Epoch: 8 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:06,197-Speed 11036.97 samples/sec Loss 8.1899 LearningRate 0.0623 Epoch: 8 Global Step: 47880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:07,101-Speed 11338.60 samples/sec Loss 8.1280 LearningRate 0.0623 Epoch: 8 Global Step: 47890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:07,998-Speed 11428.76 samples/sec Loss 8.1092 LearningRate 0.0623 Epoch: 8 Global Step: 47900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:08,928-Speed 11020.52 samples/sec Loss 8.1554 LearningRate 0.0623 Epoch: 8 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:09,847-Speed 11159.69 samples/sec Loss 8.2328 LearningRate 0.0623 Epoch: 8 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:10,779-Speed 10990.77 samples/sec Loss 8.0368 LearningRate 0.0623 Epoch: 8 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:56:11,674-Speed 11441.86 samples/sec Loss 8.3059 LearningRate 0.0623 Epoch: 8 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:12,607-Speed 10987.23 samples/sec Loss 8.1205 LearningRate 0.0623 Epoch: 8 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:13,552-Speed 10853.31 samples/sec Loss 8.1589 LearningRate 0.0623 Epoch: 8 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:14,490-Speed 10919.46 samples/sec Loss 8.2426 LearningRate 0.0623 Epoch: 8 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:15,410-Speed 11143.82 samples/sec Loss 8.1933 LearningRate 0.0623 Epoch: 8 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:16,314-Speed 11334.45 samples/sec Loss 8.2557 LearningRate 0.0623 Epoch: 8 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:17,249-Speed 10957.04 samples/sec Loss 8.1752 LearningRate 0.0622 Epoch: 8 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:56:38,548-[lfw][48000]XNorm: 13.174116 Training: 2022-04-11 00:56:38,548-[lfw][48000]Accuracy-Flip: 0.99483+-0.00411 Training: 2022-04-11 00:56:38,549-[lfw][48000]Accuracy-Highest: 0.99483 Training: 2022-04-11 00:57:03,162-[cfp_fp][48000]XNorm: 11.128583 Training: 2022-04-11 00:57:03,163-[cfp_fp][48000]Accuracy-Flip: 0.93371+-0.01401 Training: 2022-04-11 00:57:03,163-[cfp_fp][48000]Accuracy-Highest: 0.93371 Training: 2022-04-11 00:57:24,324-[agedb_30][48000]XNorm: 12.954493 Training: 2022-04-11 00:57:24,324-[agedb_30][48000]Accuracy-Flip: 0.95600+-0.00952 Training: 2022-04-11 00:57:24,325-[agedb_30][48000]Accuracy-Highest: 0.95717 Training: 2022-04-11 00:57:25,246-Speed 150.60 samples/sec Loss 8.2090 LearningRate 0.0622 Epoch: 8 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:26,130-Speed 11593.93 samples/sec Loss 8.3817 LearningRate 0.0622 Epoch: 8 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:27,083-Speed 10756.24 samples/sec Loss 8.1010 LearningRate 0.0622 Epoch: 8 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:28,025-Speed 10875.88 samples/sec Loss 7.9818 LearningRate 0.0622 Epoch: 8 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:28,944-Speed 11150.23 samples/sec Loss 8.0298 LearningRate 0.0622 Epoch: 8 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:29,878-Speed 10964.87 samples/sec Loss 8.2192 LearningRate 0.0622 Epoch: 8 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:30,825-Speed 10828.49 samples/sec Loss 8.2473 LearningRate 0.0622 Epoch: 8 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:31,788-Speed 10636.78 samples/sec Loss 8.0994 LearningRate 0.0622 Epoch: 8 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:32,724-Speed 10949.38 samples/sec Loss 8.1512 LearningRate 0.0622 Epoch: 8 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:33,657-Speed 10979.49 samples/sec Loss 8.2265 LearningRate 0.0622 Epoch: 8 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:34,573-Speed 11188.23 samples/sec Loss 8.1637 LearningRate 0.0622 Epoch: 8 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:35,467-Speed 11458.14 samples/sec Loss 8.2936 LearningRate 0.0622 Epoch: 8 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:36,390-Speed 11104.27 samples/sec Loss 8.1392 LearningRate 0.0622 Epoch: 8 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:37,318-Speed 11043.04 samples/sec Loss 8.2227 LearningRate 0.0621 Epoch: 8 Global Step: 48140 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:57:38,254-Speed 10945.32 samples/sec Loss 8.1008 LearningRate 0.0621 Epoch: 8 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:39,191-Speed 10941.29 samples/sec Loss 8.1668 LearningRate 0.0621 Epoch: 8 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:40,073-Speed 11617.27 samples/sec Loss 8.1159 LearningRate 0.0621 Epoch: 8 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:40,973-Speed 11377.97 samples/sec Loss 8.3550 LearningRate 0.0621 Epoch: 8 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:41,865-Speed 11483.22 samples/sec Loss 8.4673 LearningRate 0.0621 Epoch: 8 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:42,810-Speed 10853.01 samples/sec Loss 8.1634 LearningRate 0.0621 Epoch: 8 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:43,734-Speed 11078.84 samples/sec Loss 8.2905 LearningRate 0.0621 Epoch: 8 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:44,633-Speed 11409.82 samples/sec Loss 8.2188 LearningRate 0.0621 Epoch: 8 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:45,528-Speed 11447.49 samples/sec Loss 8.1865 LearningRate 0.0621 Epoch: 8 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:46,426-Speed 11410.46 samples/sec Loss 8.2562 LearningRate 0.0621 Epoch: 8 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:47,338-Speed 11237.65 samples/sec Loss 8.3005 LearningRate 0.0621 Epoch: 8 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:48,247-Speed 11269.58 samples/sec Loss 8.2649 LearningRate 0.0621 Epoch: 8 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:49,168-Speed 11136.36 samples/sec Loss 8.1379 LearningRate 0.0621 Epoch: 8 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:50,082-Speed 11209.40 samples/sec Loss 8.1872 LearningRate 0.0621 Epoch: 8 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:51,020-Speed 10917.78 samples/sec Loss 8.1999 LearningRate 0.0620 Epoch: 8 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:51,939-Speed 11151.67 samples/sec Loss 8.2105 LearningRate 0.0620 Epoch: 8 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:52,846-Speed 11297.32 samples/sec Loss 8.1732 LearningRate 0.0620 Epoch: 8 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:57:53,774-Speed 11043.58 samples/sec Loss 8.1942 LearningRate 0.0620 Epoch: 8 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:54,703-Speed 11037.49 samples/sec Loss 8.3266 LearningRate 0.0620 Epoch: 8 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:55,634-Speed 11002.70 samples/sec Loss 8.1148 LearningRate 0.0620 Epoch: 8 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:56,533-Speed 11397.48 samples/sec Loss 8.2096 LearningRate 0.0620 Epoch: 8 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:57,444-Speed 11250.48 samples/sec Loss 8.0835 LearningRate 0.0620 Epoch: 8 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:58,342-Speed 11399.70 samples/sec Loss 8.2947 LearningRate 0.0620 Epoch: 8 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:57:59,263-Speed 11132.25 samples/sec Loss 8.1583 LearningRate 0.0620 Epoch: 8 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:00,162-Speed 11395.52 samples/sec Loss 8.3081 LearningRate 0.0620 Epoch: 8 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:01,069-Speed 11300.30 samples/sec Loss 8.3185 LearningRate 0.0620 Epoch: 8 Global Step: 48400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:01,990-Speed 11132.43 samples/sec Loss 8.2085 LearningRate 0.0620 Epoch: 8 Global Step: 48410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:02,884-Speed 11453.57 samples/sec Loss 8.0643 LearningRate 0.0620 Epoch: 8 Global Step: 48420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:03,813-Speed 11035.21 samples/sec Loss 8.1833 LearningRate 0.0619 Epoch: 8 Global Step: 48430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:04,709-Speed 11427.55 samples/sec Loss 8.2410 LearningRate 0.0619 Epoch: 8 Global Step: 48440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:05,615-Speed 11308.68 samples/sec Loss 8.3458 LearningRate 0.0619 Epoch: 8 Global Step: 48450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:06,549-Speed 10983.96 samples/sec Loss 8.1361 LearningRate 0.0619 Epoch: 8 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:07,443-Speed 11450.46 samples/sec Loss 8.2698 LearningRate 0.0619 Epoch: 8 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:08,382-Speed 10920.28 samples/sec Loss 8.1282 LearningRate 0.0619 Epoch: 8 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:09,281-Speed 11396.95 samples/sec Loss 8.1038 LearningRate 0.0619 Epoch: 8 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:10,205-Speed 11095.79 samples/sec Loss 8.2009 LearningRate 0.0619 Epoch: 8 Global Step: 48500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:11,115-Speed 11259.85 samples/sec Loss 8.2346 LearningRate 0.0619 Epoch: 8 Global Step: 48510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:12,012-Speed 11414.80 samples/sec Loss 8.1756 LearningRate 0.0619 Epoch: 8 Global Step: 48520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:12,912-Speed 11390.38 samples/sec Loss 8.1655 LearningRate 0.0619 Epoch: 8 Global Step: 48530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:13,835-Speed 11105.72 samples/sec Loss 8.1768 LearningRate 0.0619 Epoch: 8 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:14,763-Speed 11045.26 samples/sec Loss 8.3875 LearningRate 0.0619 Epoch: 8 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:15,661-Speed 11407.95 samples/sec Loss 8.3329 LearningRate 0.0619 Epoch: 8 Global Step: 48560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:16,575-Speed 11208.93 samples/sec Loss 8.2254 LearningRate 0.0619 Epoch: 8 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:17,491-Speed 11184.35 samples/sec Loss 8.1434 LearningRate 0.0618 Epoch: 8 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:18,411-Speed 11142.58 samples/sec Loss 8.1662 LearningRate 0.0618 Epoch: 8 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:19,312-Speed 11381.04 samples/sec Loss 8.1360 LearningRate 0.0618 Epoch: 8 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:20,254-Speed 10879.47 samples/sec Loss 8.0651 LearningRate 0.0618 Epoch: 8 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:21,151-Speed 11414.26 samples/sec Loss 8.1908 LearningRate 0.0618 Epoch: 8 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:22,056-Speed 11324.61 samples/sec Loss 8.2416 LearningRate 0.0618 Epoch: 8 Global Step: 48630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:22,950-Speed 11468.20 samples/sec Loss 8.2148 LearningRate 0.0618 Epoch: 8 Global Step: 48640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:23,828-Speed 11672.40 samples/sec Loss 8.1636 LearningRate 0.0618 Epoch: 8 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:24,751-Speed 11092.27 samples/sec Loss 8.2629 LearningRate 0.0618 Epoch: 8 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:25,663-Speed 11235.42 samples/sec Loss 8.1456 LearningRate 0.0618 Epoch: 8 Global Step: 48670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:26,590-Speed 11052.90 samples/sec Loss 8.2030 LearningRate 0.0618 Epoch: 8 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:27,537-Speed 10820.35 samples/sec Loss 8.4186 LearningRate 0.0618 Epoch: 8 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:28,497-Speed 10678.80 samples/sec Loss 8.0420 LearningRate 0.0618 Epoch: 8 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:29,454-Speed 10709.78 samples/sec Loss 8.0636 LearningRate 0.0618 Epoch: 8 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:30,372-Speed 11164.52 samples/sec Loss 8.2102 LearningRate 0.0617 Epoch: 8 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:31,292-Speed 11132.22 samples/sec Loss 8.1403 LearningRate 0.0617 Epoch: 8 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:32,198-Speed 11309.60 samples/sec Loss 8.2049 LearningRate 0.0617 Epoch: 8 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:33,116-Speed 11167.15 samples/sec Loss 8.1582 LearningRate 0.0617 Epoch: 8 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:34,056-Speed 10905.23 samples/sec Loss 8.1410 LearningRate 0.0617 Epoch: 8 Global Step: 48760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:35,005-Speed 10791.70 samples/sec Loss 8.2084 LearningRate 0.0617 Epoch: 8 Global Step: 48770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:58:35,885-Speed 11645.47 samples/sec Loss 8.2246 LearningRate 0.0617 Epoch: 8 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:36,826-Speed 10891.62 samples/sec Loss 8.2792 LearningRate 0.0617 Epoch: 8 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:37,752-Speed 11067.44 samples/sec Loss 8.1790 LearningRate 0.0617 Epoch: 8 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:38,665-Speed 11221.97 samples/sec Loss 8.2426 LearningRate 0.0617 Epoch: 8 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:39,586-Speed 11119.54 samples/sec Loss 8.2566 LearningRate 0.0617 Epoch: 8 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:40,520-Speed 10980.33 samples/sec Loss 8.2912 LearningRate 0.0617 Epoch: 8 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:41,486-Speed 10599.22 samples/sec Loss 8.2195 LearningRate 0.0617 Epoch: 8 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:42,434-Speed 10808.66 samples/sec Loss 8.2578 LearningRate 0.0617 Epoch: 8 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:43,349-Speed 11203.00 samples/sec Loss 8.2137 LearningRate 0.0616 Epoch: 8 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:44,282-Speed 10985.44 samples/sec Loss 8.1547 LearningRate 0.0616 Epoch: 8 Global Step: 48870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:45,184-Speed 11354.86 samples/sec Loss 8.1548 LearningRate 0.0616 Epoch: 8 Global Step: 48880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:46,097-Speed 11223.16 samples/sec Loss 8.2270 LearningRate 0.0616 Epoch: 8 Global Step: 48890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:47,014-Speed 11179.02 samples/sec Loss 8.1829 LearningRate 0.0616 Epoch: 8 Global Step: 48900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:47,933-Speed 11146.69 samples/sec Loss 8.2307 LearningRate 0.0616 Epoch: 8 Global Step: 48910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:48,848-Speed 11201.62 samples/sec Loss 8.2004 LearningRate 0.0616 Epoch: 8 Global Step: 48920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:49,742-Speed 11460.48 samples/sec Loss 8.2376 LearningRate 0.0616 Epoch: 8 Global Step: 48930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 00:58:50,691-Speed 10798.09 samples/sec Loss 8.2114 LearningRate 0.0616 Epoch: 8 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:51,600-Speed 11281.94 samples/sec Loss 8.2050 LearningRate 0.0616 Epoch: 8 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:52,512-Speed 11235.44 samples/sec Loss 8.3862 LearningRate 0.0616 Epoch: 8 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:53,440-Speed 11032.68 samples/sec Loss 8.2025 LearningRate 0.0616 Epoch: 8 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:54,345-Speed 11325.53 samples/sec Loss 8.3218 LearningRate 0.0616 Epoch: 8 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:55,231-Speed 11569.42 samples/sec Loss 8.1157 LearningRate 0.0616 Epoch: 8 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:56,143-Speed 11233.90 samples/sec Loss 8.0843 LearningRate 0.0616 Epoch: 8 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:57,045-Speed 11358.30 samples/sec Loss 8.3241 LearningRate 0.0615 Epoch: 8 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:57,985-Speed 10903.87 samples/sec Loss 8.2003 LearningRate 0.0615 Epoch: 8 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:58,907-Speed 11116.24 samples/sec Loss 8.0797 LearningRate 0.0615 Epoch: 8 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:58:59,823-Speed 11190.13 samples/sec Loss 8.1173 LearningRate 0.0615 Epoch: 8 Global Step: 49040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:00,737-Speed 11202.20 samples/sec Loss 8.0795 LearningRate 0.0615 Epoch: 8 Global Step: 49050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:01,654-Speed 11173.86 samples/sec Loss 8.1575 LearningRate 0.0615 Epoch: 8 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:02,605-Speed 10772.74 samples/sec Loss 8.2189 LearningRate 0.0615 Epoch: 8 Global Step: 49070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:03,531-Speed 11064.44 samples/sec Loss 8.3034 LearningRate 0.0615 Epoch: 8 Global Step: 49080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:04,443-Speed 11243.35 samples/sec Loss 8.2943 LearningRate 0.0615 Epoch: 8 Global Step: 49090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:05,369-Speed 11067.54 samples/sec Loss 8.1779 LearningRate 0.0615 Epoch: 8 Global Step: 49100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:06,328-Speed 10682.48 samples/sec Loss 8.0835 LearningRate 0.0615 Epoch: 8 Global Step: 49110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:07,213-Speed 11576.07 samples/sec Loss 8.1086 LearningRate 0.0615 Epoch: 8 Global Step: 49120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:08,147-Speed 10979.17 samples/sec Loss 8.2691 LearningRate 0.0615 Epoch: 8 Global Step: 49130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:09,041-Speed 11452.89 samples/sec Loss 8.1952 LearningRate 0.0615 Epoch: 8 Global Step: 49140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:09,969-Speed 11047.48 samples/sec Loss 8.1795 LearningRate 0.0614 Epoch: 8 Global Step: 49150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:10,892-Speed 11101.13 samples/sec Loss 8.1430 LearningRate 0.0614 Epoch: 8 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:11,787-Speed 11455.25 samples/sec Loss 8.0686 LearningRate 0.0614 Epoch: 8 Global Step: 49170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:12,679-Speed 11487.98 samples/sec Loss 8.2182 LearningRate 0.0614 Epoch: 8 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:13,572-Speed 11469.42 samples/sec Loss 8.2794 LearningRate 0.0614 Epoch: 8 Global Step: 49190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:14,490-Speed 11160.90 samples/sec Loss 8.1945 LearningRate 0.0614 Epoch: 8 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:15,441-Speed 10776.96 samples/sec Loss 8.3262 LearningRate 0.0614 Epoch: 8 Global Step: 49210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:16,351-Speed 11265.14 samples/sec Loss 8.2493 LearningRate 0.0614 Epoch: 8 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:17,249-Speed 11398.90 samples/sec Loss 8.0022 LearningRate 0.0614 Epoch: 8 Global Step: 49230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:18,169-Speed 11146.55 samples/sec Loss 8.2475 LearningRate 0.0614 Epoch: 8 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:19,110-Speed 10887.33 samples/sec Loss 8.2895 LearningRate 0.0614 Epoch: 8 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:20,006-Speed 11443.65 samples/sec Loss 8.3257 LearningRate 0.0614 Epoch: 8 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:20,907-Speed 11365.91 samples/sec Loss 8.1842 LearningRate 0.0614 Epoch: 8 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:21,816-Speed 11268.55 samples/sec Loss 8.2344 LearningRate 0.0614 Epoch: 8 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:22,737-Speed 11126.91 samples/sec Loss 8.1390 LearningRate 0.0614 Epoch: 8 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:23,669-Speed 11003.12 samples/sec Loss 8.2028 LearningRate 0.0613 Epoch: 8 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:24,563-Speed 11465.76 samples/sec Loss 8.1323 LearningRate 0.0613 Epoch: 8 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:25,499-Speed 10943.19 samples/sec Loss 8.1100 LearningRate 0.0613 Epoch: 8 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:26,450-Speed 10770.89 samples/sec Loss 8.2626 LearningRate 0.0613 Epoch: 8 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:27,418-Speed 10584.05 samples/sec Loss 8.1958 LearningRate 0.0613 Epoch: 8 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:28,307-Speed 11529.31 samples/sec Loss 8.1136 LearningRate 0.0613 Epoch: 8 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:29,242-Speed 10958.17 samples/sec Loss 8.1502 LearningRate 0.0613 Epoch: 8 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:30,179-Speed 10944.27 samples/sec Loss 8.1728 LearningRate 0.0613 Epoch: 8 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:31,126-Speed 10819.24 samples/sec Loss 8.2205 LearningRate 0.0613 Epoch: 8 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:32,045-Speed 11150.24 samples/sec Loss 8.1543 LearningRate 0.0613 Epoch: 8 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:32,959-Speed 11208.60 samples/sec Loss 8.1664 LearningRate 0.0613 Epoch: 8 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:33,872-Speed 11226.13 samples/sec Loss 8.0664 LearningRate 0.0613 Epoch: 8 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:34,782-Speed 11268.92 samples/sec Loss 8.1795 LearningRate 0.0613 Epoch: 8 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:35,653-Speed 11762.79 samples/sec Loss 7.9872 LearningRate 0.0613 Epoch: 8 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:36,589-Speed 10943.67 samples/sec Loss 8.0601 LearningRate 0.0613 Epoch: 8 Global Step: 49440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:37,500-Speed 11251.01 samples/sec Loss 8.2402 LearningRate 0.0612 Epoch: 8 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:38,460-Speed 10670.08 samples/sec Loss 8.1548 LearningRate 0.0612 Epoch: 8 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:39,392-Speed 10992.30 samples/sec Loss 8.1329 LearningRate 0.0612 Epoch: 8 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:40,297-Speed 11326.34 samples/sec Loss 8.1088 LearningRate 0.0612 Epoch: 8 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:41,207-Speed 11265.88 samples/sec Loss 8.1347 LearningRate 0.0612 Epoch: 8 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:42,118-Speed 11238.96 samples/sec Loss 8.2356 LearningRate 0.0612 Epoch: 8 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:43,018-Speed 11394.14 samples/sec Loss 8.1717 LearningRate 0.0612 Epoch: 8 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:43,982-Speed 10622.24 samples/sec Loss 8.0944 LearningRate 0.0612 Epoch: 8 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:44,872-Speed 11520.44 samples/sec Loss 8.1538 LearningRate 0.0612 Epoch: 8 Global Step: 49530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:45,757-Speed 11568.28 samples/sec Loss 8.2114 LearningRate 0.0612 Epoch: 8 Global Step: 49540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:46,645-Speed 11545.79 samples/sec Loss 8.3219 LearningRate 0.0612 Epoch: 8 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:47,545-Speed 11377.72 samples/sec Loss 8.3629 LearningRate 0.0612 Epoch: 8 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:48,447-Speed 11362.73 samples/sec Loss 8.0052 LearningRate 0.0612 Epoch: 8 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:49,343-Speed 11445.09 samples/sec Loss 8.2968 LearningRate 0.0612 Epoch: 8 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:50,247-Speed 11337.23 samples/sec Loss 8.3005 LearningRate 0.0611 Epoch: 8 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:51,219-Speed 10534.76 samples/sec Loss 8.2395 LearningRate 0.0611 Epoch: 8 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:52,128-Speed 11280.15 samples/sec Loss 8.1886 LearningRate 0.0611 Epoch: 8 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:53,082-Speed 10741.74 samples/sec Loss 8.0908 LearningRate 0.0611 Epoch: 8 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:53,984-Speed 11359.53 samples/sec Loss 8.1704 LearningRate 0.0611 Epoch: 8 Global Step: 49630 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 00:59:54,900-Speed 11188.63 samples/sec Loss 8.1211 LearningRate 0.0611 Epoch: 8 Global Step: 49640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:55,833-Speed 10981.61 samples/sec Loss 8.1343 LearningRate 0.0611 Epoch: 8 Global Step: 49650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 00:59:56,714-Speed 11635.28 samples/sec Loss 8.2397 LearningRate 0.0611 Epoch: 8 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:57,639-Speed 11093.58 samples/sec Loss 8.2126 LearningRate 0.0611 Epoch: 8 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:58,558-Speed 11142.05 samples/sec Loss 8.1138 LearningRate 0.0611 Epoch: 8 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 00:59:59,470-Speed 11243.26 samples/sec Loss 8.0696 LearningRate 0.0611 Epoch: 8 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:00,388-Speed 11166.24 samples/sec Loss 8.1771 LearningRate 0.0611 Epoch: 8 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:01,284-Speed 11433.73 samples/sec Loss 8.1660 LearningRate 0.0611 Epoch: 8 Global Step: 49710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:02,203-Speed 11142.41 samples/sec Loss 8.1348 LearningRate 0.0611 Epoch: 8 Global Step: 49720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:03,148-Speed 10846.12 samples/sec Loss 8.2127 LearningRate 0.0611 Epoch: 8 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:04,053-Speed 11322.23 samples/sec Loss 8.2391 LearningRate 0.0610 Epoch: 8 Global Step: 49740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:04,981-Speed 11048.94 samples/sec Loss 8.1233 LearningRate 0.0610 Epoch: 8 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:05,889-Speed 11286.27 samples/sec Loss 8.0659 LearningRate 0.0610 Epoch: 8 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:06,804-Speed 11195.46 samples/sec Loss 8.0744 LearningRate 0.0610 Epoch: 8 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:07,695-Speed 11505.42 samples/sec Loss 8.2089 LearningRate 0.0610 Epoch: 8 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:08,603-Speed 11277.70 samples/sec Loss 8.0346 LearningRate 0.0610 Epoch: 8 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:09,553-Speed 10783.36 samples/sec Loss 8.2055 LearningRate 0.0610 Epoch: 8 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:10,457-Speed 11335.12 samples/sec Loss 8.1027 LearningRate 0.0610 Epoch: 8 Global Step: 49810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:11,380-Speed 11112.83 samples/sec Loss 8.2074 LearningRate 0.0610 Epoch: 8 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:12,295-Speed 11196.64 samples/sec Loss 8.0994 LearningRate 0.0610 Epoch: 8 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:13,221-Speed 11066.97 samples/sec Loss 8.2058 LearningRate 0.0610 Epoch: 8 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:14,140-Speed 11143.85 samples/sec Loss 8.1533 LearningRate 0.0610 Epoch: 8 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:15,022-Speed 11623.70 samples/sec Loss 8.1034 LearningRate 0.0610 Epoch: 8 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:15,911-Speed 11524.25 samples/sec Loss 8.0734 LearningRate 0.0610 Epoch: 8 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:16,811-Speed 11388.77 samples/sec Loss 8.0866 LearningRate 0.0609 Epoch: 8 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:17,769-Speed 10692.18 samples/sec Loss 8.2400 LearningRate 0.0609 Epoch: 8 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:18,740-Speed 10551.99 samples/sec Loss 8.1367 LearningRate 0.0609 Epoch: 8 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:19,681-Speed 10892.45 samples/sec Loss 8.2360 LearningRate 0.0609 Epoch: 8 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:20,587-Speed 11308.65 samples/sec Loss 8.1130 LearningRate 0.0609 Epoch: 8 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:21,465-Speed 11675.62 samples/sec Loss 8.2508 LearningRate 0.0609 Epoch: 8 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:22,358-Speed 11474.01 samples/sec Loss 8.1606 LearningRate 0.0609 Epoch: 8 Global Step: 49940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:23,242-Speed 11584.50 samples/sec Loss 8.2154 LearningRate 0.0609 Epoch: 8 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:00:24,163-Speed 11135.22 samples/sec Loss 8.1483 LearningRate 0.0609 Epoch: 8 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:25,122-Speed 10685.31 samples/sec Loss 8.1425 LearningRate 0.0609 Epoch: 8 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:26,022-Speed 11376.48 samples/sec Loss 8.1028 LearningRate 0.0609 Epoch: 8 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:26,998-Speed 10503.64 samples/sec Loss 8.0104 LearningRate 0.0609 Epoch: 8 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:27,920-Speed 11117.19 samples/sec Loss 8.1560 LearningRate 0.0609 Epoch: 8 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:00:49,175-[lfw][50000]XNorm: 13.192936 Training: 2022-04-11 01:00:49,176-[lfw][50000]Accuracy-Flip: 0.99550+-0.00342 Training: 2022-04-11 01:00:49,176-[lfw][50000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:01:13,773-[cfp_fp][50000]XNorm: 11.052071 Training: 2022-04-11 01:01:13,774-[cfp_fp][50000]Accuracy-Flip: 0.93600+-0.01517 Training: 2022-04-11 01:01:13,774-[cfp_fp][50000]Accuracy-Highest: 0.93600 Training: 2022-04-11 01:01:34,957-[agedb_30][50000]XNorm: 12.934012 Training: 2022-04-11 01:01:34,958-[agedb_30][50000]Accuracy-Flip: 0.95517+-0.00867 Training: 2022-04-11 01:01:34,958-[agedb_30][50000]Accuracy-Highest: 0.95717 Training: 2022-04-11 01:01:35,873-Speed 150.69 samples/sec Loss 8.0488 LearningRate 0.0609 Epoch: 8 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:36,792-Speed 11154.83 samples/sec Loss 8.2295 LearningRate 0.0609 Epoch: 8 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:37,737-Speed 10836.05 samples/sec Loss 8.0519 LearningRate 0.0608 Epoch: 8 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:38,656-Speed 11155.85 samples/sec Loss 8.1744 LearningRate 0.0608 Epoch: 8 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:39,574-Speed 11161.17 samples/sec Loss 8.2907 LearningRate 0.0608 Epoch: 8 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:40,488-Speed 11215.52 samples/sec Loss 8.3285 LearningRate 0.0608 Epoch: 8 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:41,430-Speed 10874.84 samples/sec Loss 8.1624 LearningRate 0.0608 Epoch: 8 Global Step: 50070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:42,369-Speed 10913.73 samples/sec Loss 8.0915 LearningRate 0.0608 Epoch: 8 Global Step: 50080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:43,258-Speed 11528.89 samples/sec Loss 8.1237 LearningRate 0.0608 Epoch: 8 Global Step: 50090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:44,187-Speed 11026.54 samples/sec Loss 8.0127 LearningRate 0.0608 Epoch: 8 Global Step: 50100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:45,096-Speed 11275.28 samples/sec Loss 8.1880 LearningRate 0.0608 Epoch: 8 Global Step: 50110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:46,012-Speed 11186.09 samples/sec Loss 8.0770 LearningRate 0.0608 Epoch: 8 Global Step: 50120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:46,915-Speed 11347.62 samples/sec Loss 8.2194 LearningRate 0.0608 Epoch: 8 Global Step: 50130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:47,818-Speed 11348.44 samples/sec Loss 8.3178 LearningRate 0.0608 Epoch: 8 Global Step: 50140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:48,767-Speed 10805.67 samples/sec Loss 8.1803 LearningRate 0.0608 Epoch: 8 Global Step: 50150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:49,722-Speed 10722.28 samples/sec Loss 8.2043 LearningRate 0.0608 Epoch: 8 Global Step: 50160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:01:50,640-Speed 11162.73 samples/sec Loss 8.1863 LearningRate 0.0607 Epoch: 8 Global Step: 50170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:51,568-Speed 11039.74 samples/sec Loss 8.1165 LearningRate 0.0607 Epoch: 8 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:52,509-Speed 10892.89 samples/sec Loss 8.2037 LearningRate 0.0607 Epoch: 8 Global Step: 50190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:53,450-Speed 10891.14 samples/sec Loss 8.1860 LearningRate 0.0607 Epoch: 8 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:54,402-Speed 10767.23 samples/sec Loss 8.2886 LearningRate 0.0607 Epoch: 8 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:55,291-Speed 11521.49 samples/sec Loss 8.1812 LearningRate 0.0607 Epoch: 8 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:56,213-Speed 11121.23 samples/sec Loss 8.1943 LearningRate 0.0607 Epoch: 8 Global Step: 50230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:57,122-Speed 11274.00 samples/sec Loss 8.2155 LearningRate 0.0607 Epoch: 8 Global Step: 50240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:58,024-Speed 11348.85 samples/sec Loss 8.0630 LearningRate 0.0607 Epoch: 8 Global Step: 50250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:58,964-Speed 10902.21 samples/sec Loss 8.2023 LearningRate 0.0607 Epoch: 8 Global Step: 50260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:01:59,876-Speed 11238.77 samples/sec Loss 8.0235 LearningRate 0.0607 Epoch: 8 Global Step: 50270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:00,782-Speed 11317.43 samples/sec Loss 8.1456 LearningRate 0.0607 Epoch: 8 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:01,693-Speed 11238.25 samples/sec Loss 8.2537 LearningRate 0.0607 Epoch: 8 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:02,627-Speed 10971.69 samples/sec Loss 8.0741 LearningRate 0.0607 Epoch: 8 Global Step: 50300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:03,543-Speed 11194.05 samples/sec Loss 8.2741 LearningRate 0.0607 Epoch: 8 Global Step: 50310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:04,430-Speed 11546.50 samples/sec Loss 8.0266 LearningRate 0.0606 Epoch: 8 Global Step: 50320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:05,320-Speed 11516.81 samples/sec Loss 8.0345 LearningRate 0.0606 Epoch: 8 Global Step: 50330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:06,245-Speed 11077.39 samples/sec Loss 8.0619 LearningRate 0.0606 Epoch: 8 Global Step: 50340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:07,178-Speed 10976.62 samples/sec Loss 8.0663 LearningRate 0.0606 Epoch: 8 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:08,127-Speed 10804.82 samples/sec Loss 8.1732 LearningRate 0.0606 Epoch: 8 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:09,040-Speed 11211.60 samples/sec Loss 8.2074 LearningRate 0.0606 Epoch: 8 Global Step: 50370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:02:09,929-Speed 11533.28 samples/sec Loss 8.2006 LearningRate 0.0606 Epoch: 8 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:10,837-Speed 11285.01 samples/sec Loss 8.1363 LearningRate 0.0606 Epoch: 8 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:11,780-Speed 10869.85 samples/sec Loss 8.0876 LearningRate 0.0606 Epoch: 8 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:12,701-Speed 11126.96 samples/sec Loss 8.0406 LearningRate 0.0606 Epoch: 8 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:13,633-Speed 10987.27 samples/sec Loss 8.1548 LearningRate 0.0606 Epoch: 8 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:14,567-Speed 10976.10 samples/sec Loss 8.1369 LearningRate 0.0606 Epoch: 8 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:15,530-Speed 10643.71 samples/sec Loss 8.1231 LearningRate 0.0606 Epoch: 8 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:16,477-Speed 10816.02 samples/sec Loss 8.2591 LearningRate 0.0606 Epoch: 8 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:17,405-Speed 11037.80 samples/sec Loss 8.1513 LearningRate 0.0606 Epoch: 8 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:18,339-Speed 10976.25 samples/sec Loss 8.1016 LearningRate 0.0605 Epoch: 8 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:19,261-Speed 11113.39 samples/sec Loss 8.1144 LearningRate 0.0605 Epoch: 8 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:20,194-Speed 10984.80 samples/sec Loss 8.1210 LearningRate 0.0605 Epoch: 8 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:21,092-Speed 11408.51 samples/sec Loss 8.1285 LearningRate 0.0605 Epoch: 8 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:22,003-Speed 11259.68 samples/sec Loss 8.0966 LearningRate 0.0605 Epoch: 8 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:22,936-Speed 10978.76 samples/sec Loss 8.1892 LearningRate 0.0605 Epoch: 8 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:23,898-Speed 10648.31 samples/sec Loss 8.1596 LearningRate 0.0605 Epoch: 8 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:24,811-Speed 11227.71 samples/sec Loss 8.2783 LearningRate 0.0605 Epoch: 8 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:25,752-Speed 10892.05 samples/sec Loss 8.2175 LearningRate 0.0605 Epoch: 8 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:26,663-Speed 11245.56 samples/sec Loss 8.1184 LearningRate 0.0605 Epoch: 8 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:27,560-Speed 11420.59 samples/sec Loss 8.0521 LearningRate 0.0605 Epoch: 8 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:28,461-Speed 11369.16 samples/sec Loss 8.2240 LearningRate 0.0605 Epoch: 8 Global Step: 50580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:29,415-Speed 10743.56 samples/sec Loss 8.1012 LearningRate 0.0605 Epoch: 8 Global Step: 50590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:30,320-Speed 11329.21 samples/sec Loss 8.0821 LearningRate 0.0605 Epoch: 8 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:31,230-Speed 11254.10 samples/sec Loss 8.0955 LearningRate 0.0604 Epoch: 8 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:32,153-Speed 11097.12 samples/sec Loss 8.3420 LearningRate 0.0604 Epoch: 8 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:33,041-Speed 11546.58 samples/sec Loss 8.0386 LearningRate 0.0604 Epoch: 8 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:33,938-Speed 11422.55 samples/sec Loss 8.2500 LearningRate 0.0604 Epoch: 8 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:34,879-Speed 10891.30 samples/sec Loss 8.2346 LearningRate 0.0604 Epoch: 8 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:35,805-Speed 11069.68 samples/sec Loss 8.1752 LearningRate 0.0604 Epoch: 8 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:36,713-Speed 11275.37 samples/sec Loss 8.0637 LearningRate 0.0604 Epoch: 8 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:02:37,638-Speed 11078.13 samples/sec Loss 8.2908 LearningRate 0.0604 Epoch: 8 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:38,604-Speed 10609.92 samples/sec Loss 8.0572 LearningRate 0.0604 Epoch: 8 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:39,557-Speed 10754.29 samples/sec Loss 8.1589 LearningRate 0.0604 Epoch: 8 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:40,455-Speed 11414.20 samples/sec Loss 8.2143 LearningRate 0.0604 Epoch: 8 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:41,435-Speed 10453.30 samples/sec Loss 8.2067 LearningRate 0.0604 Epoch: 8 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:42,375-Speed 10897.66 samples/sec Loss 8.1330 LearningRate 0.0604 Epoch: 8 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:43,275-Speed 11385.97 samples/sec Loss 8.1831 LearningRate 0.0604 Epoch: 8 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:44,177-Speed 11363.77 samples/sec Loss 8.0132 LearningRate 0.0604 Epoch: 8 Global Step: 50750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:45,103-Speed 11066.24 samples/sec Loss 8.0501 LearningRate 0.0603 Epoch: 8 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:46,031-Speed 11046.30 samples/sec Loss 8.1066 LearningRate 0.0603 Epoch: 8 Global Step: 50770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:46,939-Speed 11281.04 samples/sec Loss 8.1803 LearningRate 0.0603 Epoch: 8 Global Step: 50780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:02:47,848-Speed 11273.80 samples/sec Loss 8.1698 LearningRate 0.0603 Epoch: 8 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:48,775-Speed 11052.23 samples/sec Loss 8.0592 LearningRate 0.0603 Epoch: 8 Global Step: 50800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:49,695-Speed 11141.68 samples/sec Loss 8.0942 LearningRate 0.0603 Epoch: 8 Global Step: 50810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:50,637-Speed 10879.95 samples/sec Loss 8.1831 LearningRate 0.0603 Epoch: 8 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:51,603-Speed 10609.36 samples/sec Loss 8.1354 LearningRate 0.0603 Epoch: 8 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:52,553-Speed 10777.10 samples/sec Loss 8.1984 LearningRate 0.0603 Epoch: 8 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:53,488-Speed 10964.20 samples/sec Loss 8.1280 LearningRate 0.0603 Epoch: 8 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:54,436-Speed 10814.78 samples/sec Loss 7.9956 LearningRate 0.0603 Epoch: 8 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:55,362-Speed 11059.29 samples/sec Loss 8.0549 LearningRate 0.0603 Epoch: 8 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:56,299-Speed 10933.47 samples/sec Loss 8.0646 LearningRate 0.0603 Epoch: 8 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:57,185-Speed 11562.11 samples/sec Loss 8.1823 LearningRate 0.0603 Epoch: 8 Global Step: 50890 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:02:58,113-Speed 11046.20 samples/sec Loss 8.1304 LearningRate 0.0602 Epoch: 8 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:02:59,080-Speed 10596.46 samples/sec Loss 8.2937 LearningRate 0.0602 Epoch: 8 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:00,021-Speed 10890.20 samples/sec Loss 8.2328 LearningRate 0.0602 Epoch: 8 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:01,007-Speed 10397.45 samples/sec Loss 8.0505 LearningRate 0.0602 Epoch: 8 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:01,962-Speed 10730.36 samples/sec Loss 8.0642 LearningRate 0.0602 Epoch: 8 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:02,852-Speed 11510.08 samples/sec Loss 8.1437 LearningRate 0.0602 Epoch: 8 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:03,817-Speed 10623.49 samples/sec Loss 8.1592 LearningRate 0.0602 Epoch: 8 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:04,763-Speed 10842.29 samples/sec Loss 8.0925 LearningRate 0.0602 Epoch: 8 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:05,664-Speed 11370.06 samples/sec Loss 8.2408 LearningRate 0.0602 Epoch: 8 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:06,605-Speed 10884.85 samples/sec Loss 8.0677 LearningRate 0.0602 Epoch: 8 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:07,529-Speed 11091.37 samples/sec Loss 8.1540 LearningRate 0.0602 Epoch: 8 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:08,475-Speed 10838.55 samples/sec Loss 7.9771 LearningRate 0.0602 Epoch: 8 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:09,365-Speed 11501.76 samples/sec Loss 8.0429 LearningRate 0.0602 Epoch: 8 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:10,272-Speed 11301.41 samples/sec Loss 8.0455 LearningRate 0.0602 Epoch: 8 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:11,189-Speed 11178.82 samples/sec Loss 7.9118 LearningRate 0.0602 Epoch: 8 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:12,123-Speed 10963.49 samples/sec Loss 8.0081 LearningRate 0.0601 Epoch: 8 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:13,014-Speed 11511.55 samples/sec Loss 8.1752 LearningRate 0.0601 Epoch: 8 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:13,949-Speed 10952.60 samples/sec Loss 8.2321 LearningRate 0.0601 Epoch: 8 Global Step: 51070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:14,881-Speed 10999.01 samples/sec Loss 8.0661 LearningRate 0.0601 Epoch: 8 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:15,796-Speed 11203.83 samples/sec Loss 8.2114 LearningRate 0.0601 Epoch: 8 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:16,720-Speed 11088.43 samples/sec Loss 8.1720 LearningRate 0.0601 Epoch: 8 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:17,657-Speed 10938.15 samples/sec Loss 8.1074 LearningRate 0.0601 Epoch: 8 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:18,582-Speed 11082.37 samples/sec Loss 8.1000 LearningRate 0.0601 Epoch: 8 Global Step: 51120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:19,494-Speed 11232.79 samples/sec Loss 7.9887 LearningRate 0.0601 Epoch: 8 Global Step: 51130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:20,424-Speed 11009.98 samples/sec Loss 8.1158 LearningRate 0.0601 Epoch: 8 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:21,328-Speed 11338.00 samples/sec Loss 7.9241 LearningRate 0.0601 Epoch: 8 Global Step: 51150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:22,236-Speed 11295.76 samples/sec Loss 8.0938 LearningRate 0.0601 Epoch: 8 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:23,204-Speed 10585.23 samples/sec Loss 8.1699 LearningRate 0.0601 Epoch: 8 Global Step: 51170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:33,488-Speed 995.79 samples/sec Loss 7.7290 LearningRate 0.0601 Epoch: 9 Global Step: 51180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:34,425-Speed 10945.46 samples/sec Loss 7.3738 LearningRate 0.0601 Epoch: 9 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:35,387-Speed 10658.43 samples/sec Loss 7.3002 LearningRate 0.0600 Epoch: 9 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:36,314-Speed 11056.38 samples/sec Loss 7.2938 LearningRate 0.0600 Epoch: 9 Global Step: 51210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:37,499-Speed 8640.24 samples/sec Loss 7.3945 LearningRate 0.0600 Epoch: 9 Global Step: 51220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:38,749-Speed 8200.91 samples/sec Loss 7.2923 LearningRate 0.0600 Epoch: 9 Global Step: 51230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:39,794-Speed 9799.39 samples/sec Loss 7.4656 LearningRate 0.0600 Epoch: 9 Global Step: 51240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:40,785-Speed 10348.51 samples/sec Loss 7.3421 LearningRate 0.0600 Epoch: 9 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:41,713-Speed 11032.83 samples/sec Loss 7.5249 LearningRate 0.0600 Epoch: 9 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:42,652-Speed 10917.05 samples/sec Loss 7.5193 LearningRate 0.0600 Epoch: 9 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:43,643-Speed 10333.16 samples/sec Loss 7.3662 LearningRate 0.0600 Epoch: 9 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:44,581-Speed 10929.38 samples/sec Loss 7.4843 LearningRate 0.0600 Epoch: 9 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:45,474-Speed 11471.70 samples/sec Loss 7.2268 LearningRate 0.0600 Epoch: 9 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:46,374-Speed 11385.08 samples/sec Loss 7.4873 LearningRate 0.0600 Epoch: 9 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:47,299-Speed 11082.65 samples/sec Loss 7.4849 LearningRate 0.0600 Epoch: 9 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:48,233-Speed 10965.99 samples/sec Loss 7.5445 LearningRate 0.0600 Epoch: 9 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:49,174-Speed 10890.72 samples/sec Loss 7.4281 LearningRate 0.0599 Epoch: 9 Global Step: 51340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:50,084-Speed 11267.00 samples/sec Loss 7.5503 LearningRate 0.0599 Epoch: 9 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:03:51,017-Speed 10985.71 samples/sec Loss 7.3917 LearningRate 0.0599 Epoch: 9 Global Step: 51360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:52,015-Speed 10273.77 samples/sec Loss 7.6069 LearningRate 0.0599 Epoch: 9 Global Step: 51370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:52,974-Speed 10686.22 samples/sec Loss 7.6522 LearningRate 0.0599 Epoch: 9 Global Step: 51380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:53,946-Speed 10539.62 samples/sec Loss 7.5832 LearningRate 0.0599 Epoch: 9 Global Step: 51390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:54,887-Speed 10888.90 samples/sec Loss 7.5176 LearningRate 0.0599 Epoch: 9 Global Step: 51400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:55,796-Speed 11274.68 samples/sec Loss 7.5662 LearningRate 0.0599 Epoch: 9 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:56,696-Speed 11382.67 samples/sec Loss 7.5308 LearningRate 0.0599 Epoch: 9 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:57,587-Speed 11502.24 samples/sec Loss 7.4606 LearningRate 0.0599 Epoch: 9 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:58,495-Speed 11292.55 samples/sec Loss 7.6927 LearningRate 0.0599 Epoch: 9 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:03:59,416-Speed 11115.32 samples/sec Loss 7.7754 LearningRate 0.0599 Epoch: 9 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:00,300-Speed 11599.52 samples/sec Loss 7.4671 LearningRate 0.0599 Epoch: 9 Global Step: 51460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:01,193-Speed 11472.81 samples/sec Loss 7.6834 LearningRate 0.0599 Epoch: 9 Global Step: 51470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:02,134-Speed 10888.18 samples/sec Loss 7.4893 LearningRate 0.0599 Epoch: 9 Global Step: 51480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:03,050-Speed 11191.19 samples/sec Loss 7.5163 LearningRate 0.0598 Epoch: 9 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:03,970-Speed 11140.63 samples/sec Loss 7.5298 LearningRate 0.0598 Epoch: 9 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:04,879-Speed 11266.72 samples/sec Loss 7.5164 LearningRate 0.0598 Epoch: 9 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:05,826-Speed 10819.12 samples/sec Loss 7.6470 LearningRate 0.0598 Epoch: 9 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:06,748-Speed 11114.33 samples/sec Loss 7.5617 LearningRate 0.0598 Epoch: 9 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:07,687-Speed 10907.49 samples/sec Loss 7.6373 LearningRate 0.0598 Epoch: 9 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:08,676-Speed 10370.97 samples/sec Loss 7.6278 LearningRate 0.0598 Epoch: 9 Global Step: 51550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:09,581-Speed 11326.71 samples/sec Loss 7.6390 LearningRate 0.0598 Epoch: 9 Global Step: 51560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:10,463-Speed 11610.39 samples/sec Loss 7.5659 LearningRate 0.0598 Epoch: 9 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:11,382-Speed 11157.40 samples/sec Loss 7.7880 LearningRate 0.0598 Epoch: 9 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:12,309-Speed 11049.99 samples/sec Loss 7.5139 LearningRate 0.0598 Epoch: 9 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:13,238-Speed 11040.44 samples/sec Loss 7.6168 LearningRate 0.0598 Epoch: 9 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:14,145-Speed 11290.86 samples/sec Loss 7.6167 LearningRate 0.0598 Epoch: 9 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:15,050-Speed 11327.53 samples/sec Loss 7.7270 LearningRate 0.0598 Epoch: 9 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:16,000-Speed 10785.85 samples/sec Loss 7.7174 LearningRate 0.0598 Epoch: 9 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:16,927-Speed 11057.83 samples/sec Loss 7.6492 LearningRate 0.0597 Epoch: 9 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:17,828-Speed 11371.19 samples/sec Loss 7.7284 LearningRate 0.0597 Epoch: 9 Global Step: 51650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:18,763-Speed 10966.77 samples/sec Loss 7.7841 LearningRate 0.0597 Epoch: 9 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:19,657-Speed 11459.98 samples/sec Loss 7.6641 LearningRate 0.0597 Epoch: 9 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:20,559-Speed 11353.09 samples/sec Loss 7.7181 LearningRate 0.0597 Epoch: 9 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:21,489-Speed 11027.73 samples/sec Loss 7.7462 LearningRate 0.0597 Epoch: 9 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:22,421-Speed 10995.08 samples/sec Loss 7.8313 LearningRate 0.0597 Epoch: 9 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:23,369-Speed 10805.40 samples/sec Loss 7.8371 LearningRate 0.0597 Epoch: 9 Global Step: 51710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:24,295-Speed 11060.99 samples/sec Loss 7.8542 LearningRate 0.0597 Epoch: 9 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:25,204-Speed 11279.08 samples/sec Loss 7.6697 LearningRate 0.0597 Epoch: 9 Global Step: 51730 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:04:26,143-Speed 10909.62 samples/sec Loss 7.6940 LearningRate 0.0597 Epoch: 9 Global Step: 51740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:27,100-Speed 10707.06 samples/sec Loss 7.7575 LearningRate 0.0597 Epoch: 9 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:28,023-Speed 11107.28 samples/sec Loss 7.6130 LearningRate 0.0597 Epoch: 9 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:28,981-Speed 10696.16 samples/sec Loss 7.8769 LearningRate 0.0597 Epoch: 9 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:29,958-Speed 10490.57 samples/sec Loss 7.7135 LearningRate 0.0597 Epoch: 9 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:30,907-Speed 10795.20 samples/sec Loss 7.8478 LearningRate 0.0596 Epoch: 9 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:31,839-Speed 10993.53 samples/sec Loss 7.6578 LearningRate 0.0596 Epoch: 9 Global Step: 51800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:32,780-Speed 10885.82 samples/sec Loss 7.6533 LearningRate 0.0596 Epoch: 9 Global Step: 51810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:33,712-Speed 11001.70 samples/sec Loss 7.8588 LearningRate 0.0596 Epoch: 9 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:34,602-Speed 11513.48 samples/sec Loss 7.8929 LearningRate 0.0596 Epoch: 9 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:35,496-Speed 11464.61 samples/sec Loss 7.7206 LearningRate 0.0596 Epoch: 9 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:36,436-Speed 10901.81 samples/sec Loss 7.7344 LearningRate 0.0596 Epoch: 9 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:37,340-Speed 11337.13 samples/sec Loss 7.7085 LearningRate 0.0596 Epoch: 9 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:38,232-Speed 11479.51 samples/sec Loss 7.7382 LearningRate 0.0596 Epoch: 9 Global Step: 51870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:39,200-Speed 10593.86 samples/sec Loss 7.7013 LearningRate 0.0596 Epoch: 9 Global Step: 51880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:40,104-Speed 11328.37 samples/sec Loss 7.6832 LearningRate 0.0596 Epoch: 9 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:41,041-Speed 10939.31 samples/sec Loss 7.7364 LearningRate 0.0596 Epoch: 9 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:41,984-Speed 10860.01 samples/sec Loss 7.8491 LearningRate 0.0596 Epoch: 9 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:42,923-Speed 10912.30 samples/sec Loss 7.8276 LearningRate 0.0596 Epoch: 9 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:43,900-Speed 10493.86 samples/sec Loss 7.8250 LearningRate 0.0595 Epoch: 9 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:44,811-Speed 11242.12 samples/sec Loss 7.7797 LearningRate 0.0595 Epoch: 9 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:45,711-Speed 11394.06 samples/sec Loss 7.8776 LearningRate 0.0595 Epoch: 9 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:46,595-Speed 11589.76 samples/sec Loss 7.7672 LearningRate 0.0595 Epoch: 9 Global Step: 51960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:47,514-Speed 11145.66 samples/sec Loss 7.8713 LearningRate 0.0595 Epoch: 9 Global Step: 51970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:48,448-Speed 10969.47 samples/sec Loss 7.8904 LearningRate 0.0595 Epoch: 9 Global Step: 51980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:04:49,377-Speed 11038.94 samples/sec Loss 7.7039 LearningRate 0.0595 Epoch: 9 Global Step: 51990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:04:50,305-Speed 11047.01 samples/sec Loss 7.8493 LearningRate 0.0595 Epoch: 9 Global Step: 52000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:05:11,499-[lfw][52000]XNorm: 13.204608 Training: 2022-04-11 01:05:11,499-[lfw][52000]Accuracy-Flip: 0.99417+-0.00375 Training: 2022-04-11 01:05:11,499-[lfw][52000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:05:36,188-[cfp_fp][52000]XNorm: 11.016491 Training: 2022-04-11 01:05:36,189-[cfp_fp][52000]Accuracy-Flip: 0.93371+-0.01231 Training: 2022-04-11 01:05:36,189-[cfp_fp][52000]Accuracy-Highest: 0.93600 Training: 2022-04-11 01:05:57,348-[agedb_30][52000]XNorm: 12.803169 Training: 2022-04-11 01:05:57,348-[agedb_30][52000]Accuracy-Flip: 0.95617+-0.00803 Training: 2022-04-11 01:05:57,349-[agedb_30][52000]Accuracy-Highest: 0.95717 Training: 2022-04-11 01:05:58,247-Speed 150.72 samples/sec Loss 7.8167 LearningRate 0.0595 Epoch: 9 Global Step: 52010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:05:59,182-Speed 10964.73 samples/sec Loss 7.9926 LearningRate 0.0595 Epoch: 9 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:00,096-Speed 11206.54 samples/sec Loss 7.8981 LearningRate 0.0595 Epoch: 9 Global Step: 52030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:01,018-Speed 11116.38 samples/sec Loss 7.7248 LearningRate 0.0595 Epoch: 9 Global Step: 52040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:01,942-Speed 11085.33 samples/sec Loss 7.7527 LearningRate 0.0595 Epoch: 9 Global Step: 52050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:02,856-Speed 11209.82 samples/sec Loss 7.9220 LearningRate 0.0595 Epoch: 9 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:03,766-Speed 11266.62 samples/sec Loss 7.6674 LearningRate 0.0595 Epoch: 9 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:04,701-Speed 10956.05 samples/sec Loss 7.8103 LearningRate 0.0594 Epoch: 9 Global Step: 52080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:05,643-Speed 10893.11 samples/sec Loss 7.8993 LearningRate 0.0594 Epoch: 9 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:06,575-Speed 11007.04 samples/sec Loss 7.9544 LearningRate 0.0594 Epoch: 9 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:07,509-Speed 10975.24 samples/sec Loss 7.7298 LearningRate 0.0594 Epoch: 9 Global Step: 52110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:08,424-Speed 11197.04 samples/sec Loss 7.9204 LearningRate 0.0594 Epoch: 9 Global Step: 52120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:09,364-Speed 10896.45 samples/sec Loss 7.8517 LearningRate 0.0594 Epoch: 9 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:10,276-Speed 11239.67 samples/sec Loss 7.6412 LearningRate 0.0594 Epoch: 9 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:11,195-Speed 11150.62 samples/sec Loss 7.8508 LearningRate 0.0594 Epoch: 9 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:12,108-Speed 11225.86 samples/sec Loss 7.7598 LearningRate 0.0594 Epoch: 9 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:13,023-Speed 11198.59 samples/sec Loss 7.9862 LearningRate 0.0594 Epoch: 9 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:13,973-Speed 10785.42 samples/sec Loss 7.8161 LearningRate 0.0594 Epoch: 9 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:14,908-Speed 10951.46 samples/sec Loss 7.9362 LearningRate 0.0594 Epoch: 9 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:15,841-Speed 10991.09 samples/sec Loss 7.8511 LearningRate 0.0594 Epoch: 9 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:16,779-Speed 10923.65 samples/sec Loss 7.8119 LearningRate 0.0594 Epoch: 9 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:17,713-Speed 10978.30 samples/sec Loss 7.7476 LearningRate 0.0594 Epoch: 9 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:18,688-Speed 10505.86 samples/sec Loss 7.7722 LearningRate 0.0593 Epoch: 9 Global Step: 52230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:19,597-Speed 11277.39 samples/sec Loss 7.8244 LearningRate 0.0593 Epoch: 9 Global Step: 52240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:20,502-Speed 11316.51 samples/sec Loss 7.8807 LearningRate 0.0593 Epoch: 9 Global Step: 52250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:21,433-Speed 11009.21 samples/sec Loss 7.9087 LearningRate 0.0593 Epoch: 9 Global Step: 52260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:22,376-Speed 10868.67 samples/sec Loss 7.8672 LearningRate 0.0593 Epoch: 9 Global Step: 52270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:23,303-Speed 11055.51 samples/sec Loss 7.8785 LearningRate 0.0593 Epoch: 9 Global Step: 52280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:24,219-Speed 11195.72 samples/sec Loss 7.8618 LearningRate 0.0593 Epoch: 9 Global Step: 52290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:25,101-Speed 11616.87 samples/sec Loss 7.8316 LearningRate 0.0593 Epoch: 9 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:26,043-Speed 10878.27 samples/sec Loss 7.9268 LearningRate 0.0593 Epoch: 9 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:26,948-Speed 11318.15 samples/sec Loss 7.9908 LearningRate 0.0593 Epoch: 9 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:27,879-Speed 11009.50 samples/sec Loss 7.8177 LearningRate 0.0593 Epoch: 9 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:28,837-Speed 10690.64 samples/sec Loss 7.9166 LearningRate 0.0593 Epoch: 9 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:29,785-Speed 10806.94 samples/sec Loss 8.0958 LearningRate 0.0593 Epoch: 9 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:30,731-Speed 10832.95 samples/sec Loss 8.0237 LearningRate 0.0593 Epoch: 9 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:31,652-Speed 11122.64 samples/sec Loss 7.8056 LearningRate 0.0593 Epoch: 9 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:32,598-Speed 10840.24 samples/sec Loss 7.8535 LearningRate 0.0592 Epoch: 9 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:33,524-Speed 11064.19 samples/sec Loss 7.9515 LearningRate 0.0592 Epoch: 9 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:34,445-Speed 11131.38 samples/sec Loss 7.8702 LearningRate 0.0592 Epoch: 9 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:35,368-Speed 11099.90 samples/sec Loss 7.9398 LearningRate 0.0592 Epoch: 9 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:36,316-Speed 10809.85 samples/sec Loss 7.8907 LearningRate 0.0592 Epoch: 9 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:37,218-Speed 11361.27 samples/sec Loss 7.7248 LearningRate 0.0592 Epoch: 9 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:38,138-Speed 11131.82 samples/sec Loss 7.9667 LearningRate 0.0592 Epoch: 9 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:39,098-Speed 10670.63 samples/sec Loss 7.9858 LearningRate 0.0592 Epoch: 9 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:40,039-Speed 10893.94 samples/sec Loss 7.7508 LearningRate 0.0592 Epoch: 9 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:40,941-Speed 11364.22 samples/sec Loss 7.9573 LearningRate 0.0592 Epoch: 9 Global Step: 52470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:41,875-Speed 10969.02 samples/sec Loss 8.0383 LearningRate 0.0592 Epoch: 9 Global Step: 52480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:42,811-Speed 10954.41 samples/sec Loss 7.9412 LearningRate 0.0592 Epoch: 9 Global Step: 52490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:43,716-Speed 11324.58 samples/sec Loss 8.0049 LearningRate 0.0592 Epoch: 9 Global Step: 52500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:44,632-Speed 11176.75 samples/sec Loss 7.9314 LearningRate 0.0592 Epoch: 9 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:45,524-Speed 11499.75 samples/sec Loss 7.8527 LearningRate 0.0591 Epoch: 9 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:46,421-Speed 11416.41 samples/sec Loss 7.7446 LearningRate 0.0591 Epoch: 9 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:47,312-Speed 11506.67 samples/sec Loss 7.8591 LearningRate 0.0591 Epoch: 9 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:48,219-Speed 11298.04 samples/sec Loss 7.8473 LearningRate 0.0591 Epoch: 9 Global Step: 52550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:49,134-Speed 11202.27 samples/sec Loss 7.9721 LearningRate 0.0591 Epoch: 9 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:50,022-Speed 11535.18 samples/sec Loss 7.8809 LearningRate 0.0591 Epoch: 9 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:50,964-Speed 10874.24 samples/sec Loss 8.0452 LearningRate 0.0591 Epoch: 9 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:51,885-Speed 11126.73 samples/sec Loss 7.9524 LearningRate 0.0591 Epoch: 9 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:52,811-Speed 11067.66 samples/sec Loss 7.8576 LearningRate 0.0591 Epoch: 9 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:53,761-Speed 10786.70 samples/sec Loss 7.9302 LearningRate 0.0591 Epoch: 9 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:54,664-Speed 11356.70 samples/sec Loss 7.9754 LearningRate 0.0591 Epoch: 9 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:55,608-Speed 10868.32 samples/sec Loss 7.9303 LearningRate 0.0591 Epoch: 9 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:56,531-Speed 11098.75 samples/sec Loss 7.8582 LearningRate 0.0591 Epoch: 9 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:57,458-Speed 11052.55 samples/sec Loss 7.8378 LearningRate 0.0591 Epoch: 9 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:06:58,411-Speed 10752.14 samples/sec Loss 7.7260 LearningRate 0.0591 Epoch: 9 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:06:59,324-Speed 11220.85 samples/sec Loss 8.0328 LearningRate 0.0590 Epoch: 9 Global Step: 52670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:00,239-Speed 11203.33 samples/sec Loss 7.9193 LearningRate 0.0590 Epoch: 9 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:01,165-Speed 11063.21 samples/sec Loss 7.9864 LearningRate 0.0590 Epoch: 9 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:02,101-Speed 10947.38 samples/sec Loss 7.8091 LearningRate 0.0590 Epoch: 9 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:03,030-Speed 11036.31 samples/sec Loss 7.9743 LearningRate 0.0590 Epoch: 9 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:03,958-Speed 11034.74 samples/sec Loss 7.9920 LearningRate 0.0590 Epoch: 9 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:04,889-Speed 11014.65 samples/sec Loss 8.0053 LearningRate 0.0590 Epoch: 9 Global Step: 52730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:05,813-Speed 11079.49 samples/sec Loss 7.8662 LearningRate 0.0590 Epoch: 9 Global Step: 52740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:06,731-Speed 11162.59 samples/sec Loss 7.9740 LearningRate 0.0590 Epoch: 9 Global Step: 52750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:07,675-Speed 10859.50 samples/sec Loss 7.8672 LearningRate 0.0590 Epoch: 9 Global Step: 52760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:08,622-Speed 10816.28 samples/sec Loss 7.8238 LearningRate 0.0590 Epoch: 9 Global Step: 52770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:09,509-Speed 11556.16 samples/sec Loss 8.0441 LearningRate 0.0590 Epoch: 9 Global Step: 52780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:10,429-Speed 11138.48 samples/sec Loss 8.0269 LearningRate 0.0590 Epoch: 9 Global Step: 52790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:11,330-Speed 11375.05 samples/sec Loss 7.9692 LearningRate 0.0590 Epoch: 9 Global Step: 52800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:12,246-Speed 11196.47 samples/sec Loss 8.0045 LearningRate 0.0590 Epoch: 9 Global Step: 52810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:13,184-Speed 10913.33 samples/sec Loss 7.7686 LearningRate 0.0589 Epoch: 9 Global Step: 52820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:14,094-Speed 11264.42 samples/sec Loss 7.9265 LearningRate 0.0589 Epoch: 9 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:15,029-Speed 10957.13 samples/sec Loss 7.8721 LearningRate 0.0589 Epoch: 9 Global Step: 52840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:15,986-Speed 10711.06 samples/sec Loss 7.9131 LearningRate 0.0589 Epoch: 9 Global Step: 52850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:16,932-Speed 10849.15 samples/sec Loss 7.9048 LearningRate 0.0589 Epoch: 9 Global Step: 52860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:17,831-Speed 11401.26 samples/sec Loss 7.9462 LearningRate 0.0589 Epoch: 9 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:18,752-Speed 11118.61 samples/sec Loss 7.8811 LearningRate 0.0589 Epoch: 9 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:19,694-Speed 10884.55 samples/sec Loss 7.8297 LearningRate 0.0589 Epoch: 9 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:20,599-Speed 11325.26 samples/sec Loss 7.8759 LearningRate 0.0589 Epoch: 9 Global Step: 52900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:21,543-Speed 10852.39 samples/sec Loss 8.0366 LearningRate 0.0589 Epoch: 9 Global Step: 52910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:22,463-Speed 11135.13 samples/sec Loss 8.0217 LearningRate 0.0589 Epoch: 9 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:23,393-Speed 11034.63 samples/sec Loss 7.9319 LearningRate 0.0589 Epoch: 9 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:24,339-Speed 10837.97 samples/sec Loss 7.9515 LearningRate 0.0589 Epoch: 9 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:25,265-Speed 11069.60 samples/sec Loss 7.8995 LearningRate 0.0589 Epoch: 9 Global Step: 52950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:26,240-Speed 10512.32 samples/sec Loss 7.9098 LearningRate 0.0589 Epoch: 9 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:27,173-Speed 10978.82 samples/sec Loss 7.9367 LearningRate 0.0588 Epoch: 9 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:28,102-Speed 11035.61 samples/sec Loss 7.8427 LearningRate 0.0588 Epoch: 9 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:29,027-Speed 11076.75 samples/sec Loss 7.9181 LearningRate 0.0588 Epoch: 9 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:29,947-Speed 11140.48 samples/sec Loss 7.8908 LearningRate 0.0588 Epoch: 9 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:30,866-Speed 11148.12 samples/sec Loss 8.0620 LearningRate 0.0588 Epoch: 9 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:31,821-Speed 10728.66 samples/sec Loss 7.9448 LearningRate 0.0588 Epoch: 9 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:32,743-Speed 11116.55 samples/sec Loss 8.0173 LearningRate 0.0588 Epoch: 9 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:33,664-Speed 11122.39 samples/sec Loss 7.9746 LearningRate 0.0588 Epoch: 9 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:34,576-Speed 11238.10 samples/sec Loss 7.8933 LearningRate 0.0588 Epoch: 9 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:35,504-Speed 11047.71 samples/sec Loss 7.9065 LearningRate 0.0588 Epoch: 9 Global Step: 53060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:36,435-Speed 10998.28 samples/sec Loss 8.0131 LearningRate 0.0588 Epoch: 9 Global Step: 53070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:37,370-Speed 10964.90 samples/sec Loss 7.9210 LearningRate 0.0588 Epoch: 9 Global Step: 53080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:38,274-Speed 11337.81 samples/sec Loss 8.0058 LearningRate 0.0588 Epoch: 9 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:39,221-Speed 10823.96 samples/sec Loss 8.1818 LearningRate 0.0588 Epoch: 9 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:40,144-Speed 11101.35 samples/sec Loss 7.9537 LearningRate 0.0588 Epoch: 9 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:41,031-Speed 11544.94 samples/sec Loss 7.8855 LearningRate 0.0587 Epoch: 9 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:41,923-Speed 11491.98 samples/sec Loss 7.9054 LearningRate 0.0587 Epoch: 9 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:42,824-Speed 11374.66 samples/sec Loss 7.8712 LearningRate 0.0587 Epoch: 9 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:43,744-Speed 11141.13 samples/sec Loss 7.8862 LearningRate 0.0587 Epoch: 9 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:44,639-Speed 11447.41 samples/sec Loss 7.9585 LearningRate 0.0587 Epoch: 9 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:45,549-Speed 11267.09 samples/sec Loss 7.8683 LearningRate 0.0587 Epoch: 9 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:46,428-Speed 11654.50 samples/sec Loss 7.8004 LearningRate 0.0587 Epoch: 9 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:47,363-Speed 10950.53 samples/sec Loss 7.9688 LearningRate 0.0587 Epoch: 9 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:48,306-Speed 10873.16 samples/sec Loss 7.9777 LearningRate 0.0587 Epoch: 9 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:49,215-Speed 11273.31 samples/sec Loss 7.9764 LearningRate 0.0587 Epoch: 9 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:50,141-Speed 11063.10 samples/sec Loss 8.0560 LearningRate 0.0587 Epoch: 9 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:51,096-Speed 10735.05 samples/sec Loss 7.8758 LearningRate 0.0587 Epoch: 9 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:52,061-Speed 10623.34 samples/sec Loss 7.9575 LearningRate 0.0587 Epoch: 9 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:53,045-Speed 10414.77 samples/sec Loss 7.9449 LearningRate 0.0587 Epoch: 9 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:53,971-Speed 11060.18 samples/sec Loss 8.0200 LearningRate 0.0586 Epoch: 9 Global Step: 53260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:54,895-Speed 11093.74 samples/sec Loss 7.9993 LearningRate 0.0586 Epoch: 9 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:55,817-Speed 11113.06 samples/sec Loss 7.8552 LearningRate 0.0586 Epoch: 9 Global Step: 53280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:07:56,730-Speed 11226.36 samples/sec Loss 7.9940 LearningRate 0.0586 Epoch: 9 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:57,629-Speed 11403.63 samples/sec Loss 8.0425 LearningRate 0.0586 Epoch: 9 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:58,562-Speed 10972.38 samples/sec Loss 7.8799 LearningRate 0.0586 Epoch: 9 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:07:59,459-Speed 11435.24 samples/sec Loss 8.0538 LearningRate 0.0586 Epoch: 9 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:00,360-Speed 11372.70 samples/sec Loss 7.9260 LearningRate 0.0586 Epoch: 9 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:01,279-Speed 11148.88 samples/sec Loss 7.9939 LearningRate 0.0586 Epoch: 9 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:02,227-Speed 10802.83 samples/sec Loss 7.8153 LearningRate 0.0586 Epoch: 9 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:03,167-Speed 10905.00 samples/sec Loss 7.9939 LearningRate 0.0586 Epoch: 9 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:04,084-Speed 11171.92 samples/sec Loss 7.9733 LearningRate 0.0586 Epoch: 9 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:05,010-Speed 11061.20 samples/sec Loss 7.8932 LearningRate 0.0586 Epoch: 9 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:05,949-Speed 10917.60 samples/sec Loss 7.9258 LearningRate 0.0586 Epoch: 9 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:06,887-Speed 10919.03 samples/sec Loss 7.9125 LearningRate 0.0586 Epoch: 9 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:07,791-Speed 11341.92 samples/sec Loss 7.9421 LearningRate 0.0585 Epoch: 9 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:08,702-Speed 11250.46 samples/sec Loss 8.1138 LearningRate 0.0585 Epoch: 9 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:09,619-Speed 11172.05 samples/sec Loss 7.9340 LearningRate 0.0585 Epoch: 9 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:10,515-Speed 11435.26 samples/sec Loss 7.9431 LearningRate 0.0585 Epoch: 9 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:11,445-Speed 11026.44 samples/sec Loss 7.9901 LearningRate 0.0585 Epoch: 9 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:12,379-Speed 10966.52 samples/sec Loss 7.8706 LearningRate 0.0585 Epoch: 9 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:13,288-Speed 11276.39 samples/sec Loss 7.9733 LearningRate 0.0585 Epoch: 9 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:14,193-Speed 11317.52 samples/sec Loss 7.9380 LearningRate 0.0585 Epoch: 9 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:15,121-Speed 11043.04 samples/sec Loss 7.9188 LearningRate 0.0585 Epoch: 9 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:16,042-Speed 11126.51 samples/sec Loss 8.0942 LearningRate 0.0585 Epoch: 9 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:16,993-Speed 10773.59 samples/sec Loss 7.8817 LearningRate 0.0585 Epoch: 9 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:17,932-Speed 10912.09 samples/sec Loss 7.9582 LearningRate 0.0585 Epoch: 9 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:18,856-Speed 11097.95 samples/sec Loss 7.8941 LearningRate 0.0585 Epoch: 9 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:19,792-Speed 10944.37 samples/sec Loss 7.9885 LearningRate 0.0585 Epoch: 9 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:20,730-Speed 10922.02 samples/sec Loss 8.0352 LearningRate 0.0585 Epoch: 9 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:21,631-Speed 11384.20 samples/sec Loss 7.9651 LearningRate 0.0584 Epoch: 9 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:22,523-Speed 11477.78 samples/sec Loss 7.9522 LearningRate 0.0584 Epoch: 9 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:23,434-Speed 11257.24 samples/sec Loss 7.9180 LearningRate 0.0584 Epoch: 9 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:24,345-Speed 11250.07 samples/sec Loss 7.8605 LearningRate 0.0584 Epoch: 9 Global Step: 53590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:25,250-Speed 11314.55 samples/sec Loss 8.0401 LearningRate 0.0584 Epoch: 9 Global Step: 53600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:26,157-Speed 11297.44 samples/sec Loss 7.9453 LearningRate 0.0584 Epoch: 9 Global Step: 53610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:27,083-Speed 11071.44 samples/sec Loss 7.8643 LearningRate 0.0584 Epoch: 9 Global Step: 53620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:28,001-Speed 11156.47 samples/sec Loss 7.8928 LearningRate 0.0584 Epoch: 9 Global Step: 53630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:28,938-Speed 10940.18 samples/sec Loss 7.9986 LearningRate 0.0584 Epoch: 9 Global Step: 53640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:29,862-Speed 11085.35 samples/sec Loss 7.9908 LearningRate 0.0584 Epoch: 9 Global Step: 53650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:30,761-Speed 11398.31 samples/sec Loss 8.0095 LearningRate 0.0584 Epoch: 9 Global Step: 53660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:31,661-Speed 11388.47 samples/sec Loss 7.9409 LearningRate 0.0584 Epoch: 9 Global Step: 53670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:32,557-Speed 11439.45 samples/sec Loss 7.8489 LearningRate 0.0584 Epoch: 9 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:33,445-Speed 11537.90 samples/sec Loss 7.9100 LearningRate 0.0584 Epoch: 9 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:34,344-Speed 11398.29 samples/sec Loss 7.9666 LearningRate 0.0584 Epoch: 9 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:35,258-Speed 11216.92 samples/sec Loss 7.8805 LearningRate 0.0583 Epoch: 9 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:36,173-Speed 11192.44 samples/sec Loss 8.0376 LearningRate 0.0583 Epoch: 9 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:37,110-Speed 10942.08 samples/sec Loss 7.9928 LearningRate 0.0583 Epoch: 9 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:38,044-Speed 10964.55 samples/sec Loss 8.0108 LearningRate 0.0583 Epoch: 9 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:38,972-Speed 11042.79 samples/sec Loss 7.8995 LearningRate 0.0583 Epoch: 9 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:39,869-Speed 11421.07 samples/sec Loss 7.9646 LearningRate 0.0583 Epoch: 9 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:40,807-Speed 10931.45 samples/sec Loss 8.0070 LearningRate 0.0583 Epoch: 9 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:41,770-Speed 10638.26 samples/sec Loss 8.0755 LearningRate 0.0583 Epoch: 9 Global Step: 53780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:08:42,732-Speed 10645.57 samples/sec Loss 7.9031 LearningRate 0.0583 Epoch: 9 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:43,671-Speed 10913.38 samples/sec Loss 8.0225 LearningRate 0.0583 Epoch: 9 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:44,582-Speed 11251.75 samples/sec Loss 8.0206 LearningRate 0.0583 Epoch: 9 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:45,484-Speed 11359.25 samples/sec Loss 8.0309 LearningRate 0.0583 Epoch: 9 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:46,392-Speed 11284.72 samples/sec Loss 7.8610 LearningRate 0.0583 Epoch: 9 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:47,315-Speed 11099.41 samples/sec Loss 7.9186 LearningRate 0.0583 Epoch: 9 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:48,253-Speed 10926.37 samples/sec Loss 8.0138 LearningRate 0.0583 Epoch: 9 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:49,178-Speed 11079.26 samples/sec Loss 8.0549 LearningRate 0.0582 Epoch: 9 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:50,115-Speed 10929.71 samples/sec Loss 8.0964 LearningRate 0.0582 Epoch: 9 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:51,039-Speed 11098.98 samples/sec Loss 7.8474 LearningRate 0.0582 Epoch: 9 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:51,972-Speed 10979.11 samples/sec Loss 8.0010 LearningRate 0.0582 Epoch: 9 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:52,904-Speed 10996.20 samples/sec Loss 7.9635 LearningRate 0.0582 Epoch: 9 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:08:53,815-Speed 11243.25 samples/sec Loss 7.8798 LearningRate 0.0582 Epoch: 9 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:54,800-Speed 10407.06 samples/sec Loss 8.1012 LearningRate 0.0582 Epoch: 9 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:55,742-Speed 10886.67 samples/sec Loss 7.9894 LearningRate 0.0582 Epoch: 9 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:56,652-Speed 11258.22 samples/sec Loss 7.9559 LearningRate 0.0582 Epoch: 9 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:57,588-Speed 10955.97 samples/sec Loss 8.0151 LearningRate 0.0582 Epoch: 9 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:58,548-Speed 10669.03 samples/sec Loss 7.8718 LearningRate 0.0582 Epoch: 9 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:08:59,470-Speed 11116.54 samples/sec Loss 7.9535 LearningRate 0.0582 Epoch: 9 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:09:00,371-Speed 11364.41 samples/sec Loss 7.9634 LearningRate 0.0582 Epoch: 9 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:09:01,271-Speed 11397.69 samples/sec Loss 8.0353 LearningRate 0.0582 Epoch: 9 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:09:02,233-Speed 10647.29 samples/sec Loss 7.8330 LearningRate 0.0582 Epoch: 9 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:09:23,473-[lfw][54000]XNorm: 12.928594 Training: 2022-04-11 01:09:23,474-[lfw][54000]Accuracy-Flip: 0.99350+-0.00437 Training: 2022-04-11 01:09:23,474-[lfw][54000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:09:48,018-[cfp_fp][54000]XNorm: 10.836648 Training: 2022-04-11 01:09:48,019-[cfp_fp][54000]Accuracy-Flip: 0.93786+-0.01168 Training: 2022-04-11 01:09:48,019-[cfp_fp][54000]Accuracy-Highest: 0.93786 Training: 2022-04-11 01:10:10,057-[agedb_30][54000]XNorm: 12.558017 Training: 2022-04-11 01:10:10,058-[agedb_30][54000]Accuracy-Flip: 0.95533+-0.01194 Training: 2022-04-11 01:10:10,058-[agedb_30][54000]Accuracy-Highest: 0.95717 Training: 2022-04-11 01:10:11,008-Speed 148.89 samples/sec Loss 7.8638 LearningRate 0.0581 Epoch: 9 Global Step: 54010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:11,943-Speed 10955.98 samples/sec Loss 7.9430 LearningRate 0.0581 Epoch: 9 Global Step: 54020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:12,877-Speed 10976.96 samples/sec Loss 7.9577 LearningRate 0.0581 Epoch: 9 Global Step: 54030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:13,797-Speed 11140.57 samples/sec Loss 7.9198 LearningRate 0.0581 Epoch: 9 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:14,755-Speed 10686.81 samples/sec Loss 7.9854 LearningRate 0.0581 Epoch: 9 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:15,630-Speed 11715.80 samples/sec Loss 7.9323 LearningRate 0.0581 Epoch: 9 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:16,557-Speed 11051.47 samples/sec Loss 7.9917 LearningRate 0.0581 Epoch: 9 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:17,436-Speed 11662.86 samples/sec Loss 7.9142 LearningRate 0.0581 Epoch: 9 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:18,402-Speed 10608.66 samples/sec Loss 7.8036 LearningRate 0.0581 Epoch: 9 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:19,330-Speed 11040.48 samples/sec Loss 8.0376 LearningRate 0.0581 Epoch: 9 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:20,229-Speed 11400.81 samples/sec Loss 7.9392 LearningRate 0.0581 Epoch: 9 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:21,134-Speed 11321.27 samples/sec Loss 7.9852 LearningRate 0.0581 Epoch: 9 Global Step: 54120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:22,032-Speed 11411.89 samples/sec Loss 7.9584 LearningRate 0.0581 Epoch: 9 Global Step: 54130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:22,963-Speed 11000.57 samples/sec Loss 8.0157 LearningRate 0.0581 Epoch: 9 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:23,894-Speed 11009.57 samples/sec Loss 7.8388 LearningRate 0.0581 Epoch: 9 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:24,797-Speed 11345.02 samples/sec Loss 8.1319 LearningRate 0.0580 Epoch: 9 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:25,707-Speed 11259.24 samples/sec Loss 7.8194 LearningRate 0.0580 Epoch: 9 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:26,623-Speed 11188.15 samples/sec Loss 8.0008 LearningRate 0.0580 Epoch: 9 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:27,534-Speed 11251.68 samples/sec Loss 7.9194 LearningRate 0.0580 Epoch: 9 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:28,431-Speed 11421.29 samples/sec Loss 7.9373 LearningRate 0.0580 Epoch: 9 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:29,371-Speed 10901.62 samples/sec Loss 7.9600 LearningRate 0.0580 Epoch: 9 Global Step: 54210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:30,295-Speed 11089.17 samples/sec Loss 7.9773 LearningRate 0.0580 Epoch: 9 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:31,204-Speed 11274.85 samples/sec Loss 7.8624 LearningRate 0.0580 Epoch: 9 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:32,150-Speed 10831.82 samples/sec Loss 8.0839 LearningRate 0.0580 Epoch: 9 Global Step: 54240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:33,054-Speed 11344.87 samples/sec Loss 8.0124 LearningRate 0.0580 Epoch: 9 Global Step: 54250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:33,952-Speed 11401.46 samples/sec Loss 7.9660 LearningRate 0.0580 Epoch: 9 Global Step: 54260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:34,855-Speed 11352.59 samples/sec Loss 7.9690 LearningRate 0.0580 Epoch: 9 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:35,787-Speed 10997.03 samples/sec Loss 7.9442 LearningRate 0.0580 Epoch: 9 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:36,682-Speed 11449.36 samples/sec Loss 7.8717 LearningRate 0.0580 Epoch: 9 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:37,605-Speed 11101.12 samples/sec Loss 8.0654 LearningRate 0.0580 Epoch: 9 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:38,555-Speed 10786.93 samples/sec Loss 8.0328 LearningRate 0.0579 Epoch: 9 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:39,477-Speed 11113.59 samples/sec Loss 7.9544 LearningRate 0.0579 Epoch: 9 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:40,430-Speed 10753.13 samples/sec Loss 8.1111 LearningRate 0.0579 Epoch: 9 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:41,361-Speed 11003.83 samples/sec Loss 8.0415 LearningRate 0.0579 Epoch: 9 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:42,245-Speed 11594.80 samples/sec Loss 8.0045 LearningRate 0.0579 Epoch: 9 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:43,149-Speed 11333.83 samples/sec Loss 7.9956 LearningRate 0.0579 Epoch: 9 Global Step: 54360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:44,057-Speed 11290.08 samples/sec Loss 7.8947 LearningRate 0.0579 Epoch: 9 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:44,965-Speed 11278.64 samples/sec Loss 7.8505 LearningRate 0.0579 Epoch: 9 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:45,903-Speed 10922.46 samples/sec Loss 7.9898 LearningRate 0.0579 Epoch: 9 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:46,828-Speed 11075.16 samples/sec Loss 7.9249 LearningRate 0.0579 Epoch: 9 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:47,716-Speed 11541.91 samples/sec Loss 8.0711 LearningRate 0.0579 Epoch: 9 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:48,656-Speed 10906.63 samples/sec Loss 7.8763 LearningRate 0.0579 Epoch: 9 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:49,562-Speed 11303.60 samples/sec Loss 8.0042 LearningRate 0.0579 Epoch: 9 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:50,458-Speed 11436.98 samples/sec Loss 7.8997 LearningRate 0.0579 Epoch: 9 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:51,408-Speed 10788.17 samples/sec Loss 8.0335 LearningRate 0.0579 Epoch: 9 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:52,367-Speed 10682.46 samples/sec Loss 8.0787 LearningRate 0.0578 Epoch: 9 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:10:53,312-Speed 10851.69 samples/sec Loss 8.0513 LearningRate 0.0578 Epoch: 9 Global Step: 54470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:54,219-Speed 11291.68 samples/sec Loss 8.0470 LearningRate 0.0578 Epoch: 9 Global Step: 54480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:55,166-Speed 10824.08 samples/sec Loss 8.0015 LearningRate 0.0578 Epoch: 9 Global Step: 54490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:56,108-Speed 10881.56 samples/sec Loss 7.9386 LearningRate 0.0578 Epoch: 9 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:57,061-Speed 10753.13 samples/sec Loss 8.0319 LearningRate 0.0578 Epoch: 9 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:57,981-Speed 11138.56 samples/sec Loss 7.8549 LearningRate 0.0578 Epoch: 9 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:58,896-Speed 11195.15 samples/sec Loss 8.0015 LearningRate 0.0578 Epoch: 9 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:10:59,845-Speed 10804.38 samples/sec Loss 7.9984 LearningRate 0.0578 Epoch: 9 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:00,801-Speed 10720.43 samples/sec Loss 7.9298 LearningRate 0.0578 Epoch: 9 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:01,723-Speed 11110.37 samples/sec Loss 7.8695 LearningRate 0.0578 Epoch: 9 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:02,668-Speed 10839.94 samples/sec Loss 7.9608 LearningRate 0.0578 Epoch: 9 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:03,608-Speed 10905.81 samples/sec Loss 7.8748 LearningRate 0.0578 Epoch: 9 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:04,539-Speed 11007.28 samples/sec Loss 7.9698 LearningRate 0.0578 Epoch: 9 Global Step: 54590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:05,476-Speed 10932.23 samples/sec Loss 8.0337 LearningRate 0.0578 Epoch: 9 Global Step: 54600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:06,382-Speed 11314.54 samples/sec Loss 7.9942 LearningRate 0.0577 Epoch: 9 Global Step: 54610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:07,318-Speed 10939.02 samples/sec Loss 7.9989 LearningRate 0.0577 Epoch: 9 Global Step: 54620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:08,258-Speed 10899.95 samples/sec Loss 7.7987 LearningRate 0.0577 Epoch: 9 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:09,200-Speed 10884.34 samples/sec Loss 7.9053 LearningRate 0.0577 Epoch: 9 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:10,158-Speed 10691.98 samples/sec Loss 8.0226 LearningRate 0.0577 Epoch: 9 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:11,098-Speed 10907.67 samples/sec Loss 7.9878 LearningRate 0.0577 Epoch: 9 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:12,018-Speed 11137.40 samples/sec Loss 7.9807 LearningRate 0.0577 Epoch: 9 Global Step: 54670 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:11:12,973-Speed 10730.75 samples/sec Loss 7.9814 LearningRate 0.0577 Epoch: 9 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:13,914-Speed 10887.79 samples/sec Loss 8.0956 LearningRate 0.0577 Epoch: 9 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:14,871-Speed 10702.64 samples/sec Loss 7.9799 LearningRate 0.0577 Epoch: 9 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:15,781-Speed 11272.48 samples/sec Loss 7.9672 LearningRate 0.0577 Epoch: 9 Global Step: 54710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:16,708-Speed 11048.31 samples/sec Loss 7.8397 LearningRate 0.0577 Epoch: 9 Global Step: 54720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:17,638-Speed 11015.86 samples/sec Loss 7.9604 LearningRate 0.0577 Epoch: 9 Global Step: 54730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:18,573-Speed 10959.65 samples/sec Loss 7.8974 LearningRate 0.0577 Epoch: 9 Global Step: 54740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:19,494-Speed 11125.86 samples/sec Loss 8.1441 LearningRate 0.0577 Epoch: 9 Global Step: 54750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:20,421-Speed 11053.08 samples/sec Loss 7.9356 LearningRate 0.0576 Epoch: 9 Global Step: 54760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:21,378-Speed 10713.75 samples/sec Loss 8.0244 LearningRate 0.0576 Epoch: 9 Global Step: 54770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:22,278-Speed 11385.54 samples/sec Loss 8.0297 LearningRate 0.0576 Epoch: 9 Global Step: 54780 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:11:23,202-Speed 11087.83 samples/sec Loss 7.8846 LearningRate 0.0576 Epoch: 9 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:24,147-Speed 10845.13 samples/sec Loss 7.9829 LearningRate 0.0576 Epoch: 9 Global Step: 54800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:25,068-Speed 11130.47 samples/sec Loss 7.8313 LearningRate 0.0576 Epoch: 9 Global Step: 54810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:26,024-Speed 10724.44 samples/sec Loss 7.9656 LearningRate 0.0576 Epoch: 9 Global Step: 54820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:26,959-Speed 10956.18 samples/sec Loss 7.9931 LearningRate 0.0576 Epoch: 9 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:27,869-Speed 11264.82 samples/sec Loss 7.9605 LearningRate 0.0576 Epoch: 9 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:28,841-Speed 10546.94 samples/sec Loss 7.7760 LearningRate 0.0576 Epoch: 9 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:29,791-Speed 10776.27 samples/sec Loss 7.9315 LearningRate 0.0576 Epoch: 9 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:30,692-Speed 11383.64 samples/sec Loss 7.9503 LearningRate 0.0576 Epoch: 9 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:31,603-Speed 11247.47 samples/sec Loss 8.0589 LearningRate 0.0576 Epoch: 9 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:32,499-Speed 11434.39 samples/sec Loss 7.8850 LearningRate 0.0576 Epoch: 9 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:33,436-Speed 10927.31 samples/sec Loss 8.0288 LearningRate 0.0576 Epoch: 9 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:34,351-Speed 11206.08 samples/sec Loss 8.0512 LearningRate 0.0575 Epoch: 9 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:35,256-Speed 11328.96 samples/sec Loss 7.9833 LearningRate 0.0575 Epoch: 9 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:36,187-Speed 11006.41 samples/sec Loss 7.8037 LearningRate 0.0575 Epoch: 9 Global Step: 54930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:37,097-Speed 11256.84 samples/sec Loss 8.0212 LearningRate 0.0575 Epoch: 9 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:38,030-Speed 10985.88 samples/sec Loss 7.9411 LearningRate 0.0575 Epoch: 9 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:38,934-Speed 11338.15 samples/sec Loss 7.9075 LearningRate 0.0575 Epoch: 9 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:39,848-Speed 11205.43 samples/sec Loss 7.8910 LearningRate 0.0575 Epoch: 9 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:40,756-Speed 11295.23 samples/sec Loss 7.8428 LearningRate 0.0575 Epoch: 9 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:41,693-Speed 10930.58 samples/sec Loss 7.9768 LearningRate 0.0575 Epoch: 9 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:42,599-Speed 11320.49 samples/sec Loss 8.0129 LearningRate 0.0575 Epoch: 9 Global Step: 55000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:43,508-Speed 11264.07 samples/sec Loss 7.9888 LearningRate 0.0575 Epoch: 9 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:44,425-Speed 11176.70 samples/sec Loss 7.8841 LearningRate 0.0575 Epoch: 9 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:45,335-Speed 11264.70 samples/sec Loss 7.8864 LearningRate 0.0575 Epoch: 9 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:46,251-Speed 11177.91 samples/sec Loss 7.9230 LearningRate 0.0575 Epoch: 9 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:47,184-Speed 10987.52 samples/sec Loss 7.9161 LearningRate 0.0575 Epoch: 9 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:48,087-Speed 11346.54 samples/sec Loss 7.8901 LearningRate 0.0574 Epoch: 9 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:49,031-Speed 10857.61 samples/sec Loss 7.9108 LearningRate 0.0574 Epoch: 9 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:49,977-Speed 10829.72 samples/sec Loss 8.0916 LearningRate 0.0574 Epoch: 9 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:50,906-Speed 11026.66 samples/sec Loss 7.9148 LearningRate 0.0574 Epoch: 9 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:51,856-Speed 10787.39 samples/sec Loss 7.9395 LearningRate 0.0574 Epoch: 9 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:52,776-Speed 11145.18 samples/sec Loss 7.9048 LearningRate 0.0574 Epoch: 9 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:11:53,701-Speed 11074.29 samples/sec Loss 8.0764 LearningRate 0.0574 Epoch: 9 Global Step: 55120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:54,638-Speed 10942.45 samples/sec Loss 8.0741 LearningRate 0.0574 Epoch: 9 Global Step: 55130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:55,582-Speed 10856.31 samples/sec Loss 7.9586 LearningRate 0.0574 Epoch: 9 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:56,507-Speed 11077.76 samples/sec Loss 8.0278 LearningRate 0.0574 Epoch: 9 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:57,408-Speed 11365.82 samples/sec Loss 8.0955 LearningRate 0.0574 Epoch: 9 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:58,346-Speed 10935.40 samples/sec Loss 8.0863 LearningRate 0.0574 Epoch: 9 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:11:59,246-Speed 11387.70 samples/sec Loss 7.9316 LearningRate 0.0574 Epoch: 9 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:00,156-Speed 11258.53 samples/sec Loss 7.9471 LearningRate 0.0574 Epoch: 9 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:01,053-Speed 11425.20 samples/sec Loss 7.8142 LearningRate 0.0574 Epoch: 9 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:01,995-Speed 10876.38 samples/sec Loss 8.0591 LearningRate 0.0573 Epoch: 9 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:02,916-Speed 11123.45 samples/sec Loss 7.8316 LearningRate 0.0573 Epoch: 9 Global Step: 55220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:03,853-Speed 10930.88 samples/sec Loss 7.9370 LearningRate 0.0573 Epoch: 9 Global Step: 55230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:04,770-Speed 11182.29 samples/sec Loss 7.8688 LearningRate 0.0573 Epoch: 9 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:05,645-Speed 11716.60 samples/sec Loss 7.9454 LearningRate 0.0573 Epoch: 9 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:06,542-Speed 11425.34 samples/sec Loss 7.8314 LearningRate 0.0573 Epoch: 9 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:07,473-Speed 11001.11 samples/sec Loss 8.0542 LearningRate 0.0573 Epoch: 9 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:08,388-Speed 11199.49 samples/sec Loss 7.8803 LearningRate 0.0573 Epoch: 9 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:09,306-Speed 11163.41 samples/sec Loss 7.8233 LearningRate 0.0573 Epoch: 9 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:10,223-Speed 11175.45 samples/sec Loss 7.9744 LearningRate 0.0573 Epoch: 9 Global Step: 55300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:11,148-Speed 11078.62 samples/sec Loss 8.0329 LearningRate 0.0573 Epoch: 9 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:12,089-Speed 10885.93 samples/sec Loss 7.9458 LearningRate 0.0573 Epoch: 9 Global Step: 55320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:13,030-Speed 10893.85 samples/sec Loss 7.9766 LearningRate 0.0573 Epoch: 9 Global Step: 55330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:13,937-Speed 11294.62 samples/sec Loss 8.0721 LearningRate 0.0573 Epoch: 9 Global Step: 55340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:14,832-Speed 11449.69 samples/sec Loss 8.2634 LearningRate 0.0573 Epoch: 9 Global Step: 55350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:15,732-Speed 11382.30 samples/sec Loss 8.0189 LearningRate 0.0572 Epoch: 9 Global Step: 55360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:16,699-Speed 10596.36 samples/sec Loss 7.9870 LearningRate 0.0572 Epoch: 9 Global Step: 55370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:17,598-Speed 11398.97 samples/sec Loss 7.9616 LearningRate 0.0572 Epoch: 9 Global Step: 55380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:18,514-Speed 11196.31 samples/sec Loss 8.0673 LearningRate 0.0572 Epoch: 9 Global Step: 55390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:19,453-Speed 10908.12 samples/sec Loss 7.8498 LearningRate 0.0572 Epoch: 9 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:20,354-Speed 11369.93 samples/sec Loss 7.9090 LearningRate 0.0572 Epoch: 9 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:21,239-Speed 11583.37 samples/sec Loss 7.8951 LearningRate 0.0572 Epoch: 9 Global Step: 55420 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:12:22,159-Speed 11145.82 samples/sec Loss 7.8814 LearningRate 0.0572 Epoch: 9 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:23,053-Speed 11453.68 samples/sec Loss 8.1075 LearningRate 0.0572 Epoch: 9 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:23,966-Speed 11231.35 samples/sec Loss 7.8478 LearningRate 0.0572 Epoch: 9 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:24,866-Speed 11378.01 samples/sec Loss 8.0097 LearningRate 0.0572 Epoch: 9 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:25,757-Speed 11507.79 samples/sec Loss 7.9489 LearningRate 0.0572 Epoch: 9 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:26,694-Speed 10926.31 samples/sec Loss 7.9364 LearningRate 0.0572 Epoch: 9 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:27,634-Speed 10911.27 samples/sec Loss 8.0339 LearningRate 0.0572 Epoch: 9 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:28,534-Speed 11383.14 samples/sec Loss 7.9469 LearningRate 0.0572 Epoch: 9 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:29,449-Speed 11191.97 samples/sec Loss 8.0142 LearningRate 0.0571 Epoch: 9 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:30,387-Speed 10922.13 samples/sec Loss 7.9591 LearningRate 0.0571 Epoch: 9 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:31,337-Speed 10790.83 samples/sec Loss 7.9982 LearningRate 0.0571 Epoch: 9 Global Step: 55530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:32,262-Speed 11079.46 samples/sec Loss 7.9951 LearningRate 0.0571 Epoch: 9 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:33,154-Speed 11489.42 samples/sec Loss 7.9746 LearningRate 0.0571 Epoch: 9 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:34,067-Speed 11217.22 samples/sec Loss 8.0394 LearningRate 0.0571 Epoch: 9 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:34,972-Speed 11322.65 samples/sec Loss 7.8891 LearningRate 0.0571 Epoch: 9 Global Step: 55570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:35,901-Speed 11024.78 samples/sec Loss 7.9620 LearningRate 0.0571 Epoch: 9 Global Step: 55580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:36,829-Speed 11040.64 samples/sec Loss 7.9296 LearningRate 0.0571 Epoch: 9 Global Step: 55590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:37,735-Speed 11310.61 samples/sec Loss 7.8927 LearningRate 0.0571 Epoch: 9 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:38,658-Speed 11108.92 samples/sec Loss 7.9213 LearningRate 0.0571 Epoch: 9 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:39,617-Speed 10683.56 samples/sec Loss 8.1150 LearningRate 0.0571 Epoch: 9 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:40,526-Speed 11284.66 samples/sec Loss 7.8898 LearningRate 0.0571 Epoch: 9 Global Step: 55630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:41,447-Speed 11120.24 samples/sec Loss 7.9038 LearningRate 0.0571 Epoch: 9 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:42,392-Speed 10849.50 samples/sec Loss 7.9875 LearningRate 0.0571 Epoch: 9 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:43,332-Speed 10895.60 samples/sec Loss 7.8808 LearningRate 0.0570 Epoch: 9 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:44,276-Speed 10852.16 samples/sec Loss 7.8954 LearningRate 0.0570 Epoch: 9 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:45,175-Speed 11398.37 samples/sec Loss 7.7903 LearningRate 0.0570 Epoch: 9 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:46,067-Speed 11494.97 samples/sec Loss 8.0187 LearningRate 0.0570 Epoch: 9 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:46,982-Speed 11189.05 samples/sec Loss 7.8223 LearningRate 0.0570 Epoch: 9 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:47,899-Speed 11175.12 samples/sec Loss 7.9243 LearningRate 0.0570 Epoch: 9 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:48,826-Speed 11059.99 samples/sec Loss 7.9439 LearningRate 0.0570 Epoch: 9 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:49,725-Speed 11399.40 samples/sec Loss 7.8569 LearningRate 0.0570 Epoch: 9 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:50,617-Speed 11483.02 samples/sec Loss 7.6922 LearningRate 0.0570 Epoch: 9 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:51,532-Speed 11202.00 samples/sec Loss 7.8757 LearningRate 0.0570 Epoch: 9 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:52,468-Speed 10949.21 samples/sec Loss 7.8921 LearningRate 0.0570 Epoch: 9 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:12:53,400-Speed 10999.27 samples/sec Loss 7.9069 LearningRate 0.0570 Epoch: 9 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:54,304-Speed 11330.96 samples/sec Loss 7.9561 LearningRate 0.0570 Epoch: 9 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:55,194-Speed 11516.73 samples/sec Loss 8.0783 LearningRate 0.0570 Epoch: 9 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:56,121-Speed 11044.02 samples/sec Loss 7.9512 LearningRate 0.0570 Epoch: 9 Global Step: 55800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:57,070-Speed 10810.23 samples/sec Loss 8.0370 LearningRate 0.0569 Epoch: 9 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:58,009-Speed 10907.32 samples/sec Loss 7.9015 LearningRate 0.0569 Epoch: 9 Global Step: 55820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:58,926-Speed 11178.60 samples/sec Loss 8.0386 LearningRate 0.0569 Epoch: 9 Global Step: 55830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:12:59,889-Speed 10641.36 samples/sec Loss 8.0572 LearningRate 0.0569 Epoch: 9 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:13:00,808-Speed 11157.83 samples/sec Loss 7.9949 LearningRate 0.0569 Epoch: 9 Global Step: 55850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:13:01,735-Speed 11047.70 samples/sec Loss 8.0975 LearningRate 0.0569 Epoch: 9 Global Step: 55860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:13:02,639-Speed 11349.46 samples/sec Loss 7.8841 LearningRate 0.0569 Epoch: 9 Global Step: 55870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:13:03,516-Speed 11676.69 samples/sec Loss 7.9030 LearningRate 0.0569 Epoch: 9 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:13:04,414-Speed 11414.18 samples/sec Loss 8.0395 LearningRate 0.0569 Epoch: 9 Global Step: 55890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:13:05,370-Speed 10718.52 samples/sec Loss 8.0752 LearningRate 0.0569 Epoch: 9 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:13:06,309-Speed 10915.39 samples/sec Loss 7.8807 LearningRate 0.0569 Epoch: 9 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:07,231-Speed 11113.72 samples/sec Loss 7.8879 LearningRate 0.0569 Epoch: 9 Global Step: 55920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:08,175-Speed 10853.41 samples/sec Loss 7.9585 LearningRate 0.0569 Epoch: 9 Global Step: 55930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:09,093-Speed 11153.26 samples/sec Loss 7.8376 LearningRate 0.0569 Epoch: 9 Global Step: 55940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:10,025-Speed 10999.66 samples/sec Loss 7.9585 LearningRate 0.0569 Epoch: 9 Global Step: 55950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:10,927-Speed 11365.47 samples/sec Loss 7.8962 LearningRate 0.0568 Epoch: 9 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:11,854-Speed 11047.60 samples/sec Loss 7.8716 LearningRate 0.0568 Epoch: 9 Global Step: 55970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:12,749-Speed 11451.36 samples/sec Loss 7.9427 LearningRate 0.0568 Epoch: 9 Global Step: 55980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:13,666-Speed 11173.42 samples/sec Loss 7.8632 LearningRate 0.0568 Epoch: 9 Global Step: 55990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:14,618-Speed 10766.43 samples/sec Loss 7.7409 LearningRate 0.0568 Epoch: 9 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:13:35,854-[lfw][56000]XNorm: 12.903393 Training: 2022-04-11 01:13:35,855-[lfw][56000]Accuracy-Flip: 0.99483+-0.00383 Training: 2022-04-11 01:13:35,855-[lfw][56000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:14:00,786-[cfp_fp][56000]XNorm: 10.842624 Training: 2022-04-11 01:14:00,787-[cfp_fp][56000]Accuracy-Flip: 0.93429+-0.01012 Training: 2022-04-11 01:14:00,787-[cfp_fp][56000]Accuracy-Highest: 0.93786 Training: 2022-04-11 01:14:22,009-[agedb_30][56000]XNorm: 12.626072 Training: 2022-04-11 01:14:22,010-[agedb_30][56000]Accuracy-Flip: 0.95667+-0.01075 Training: 2022-04-11 01:14:22,010-[agedb_30][56000]Accuracy-Highest: 0.95717 Training: 2022-04-11 01:14:22,904-Speed 149.96 samples/sec Loss 7.8770 LearningRate 0.0568 Epoch: 9 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:23,803-Speed 11401.19 samples/sec Loss 7.9609 LearningRate 0.0568 Epoch: 9 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:24,718-Speed 11191.18 samples/sec Loss 8.0257 LearningRate 0.0568 Epoch: 9 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:25,642-Speed 11093.20 samples/sec Loss 8.0025 LearningRate 0.0568 Epoch: 9 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:26,576-Speed 10972.38 samples/sec Loss 7.9033 LearningRate 0.0568 Epoch: 9 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:27,493-Speed 11177.68 samples/sec Loss 7.8170 LearningRate 0.0568 Epoch: 9 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:28,430-Speed 10939.92 samples/sec Loss 7.9006 LearningRate 0.0568 Epoch: 9 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:29,352-Speed 11106.87 samples/sec Loss 8.0586 LearningRate 0.0568 Epoch: 9 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:30,257-Speed 11334.22 samples/sec Loss 8.0155 LearningRate 0.0568 Epoch: 9 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:31,155-Speed 11409.23 samples/sec Loss 7.9876 LearningRate 0.0568 Epoch: 9 Global Step: 56100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:32,057-Speed 11359.58 samples/sec Loss 8.0216 LearningRate 0.0567 Epoch: 9 Global Step: 56110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:32,975-Speed 11160.41 samples/sec Loss 7.8670 LearningRate 0.0567 Epoch: 9 Global Step: 56120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:33,866-Speed 11501.99 samples/sec Loss 7.9552 LearningRate 0.0567 Epoch: 9 Global Step: 56130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:34,783-Speed 11176.12 samples/sec Loss 7.9408 LearningRate 0.0567 Epoch: 9 Global Step: 56140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:35,740-Speed 10702.18 samples/sec Loss 7.9115 LearningRate 0.0567 Epoch: 9 Global Step: 56150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:14:36,653-Speed 11225.24 samples/sec Loss 7.9124 LearningRate 0.0567 Epoch: 9 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:37,602-Speed 10795.25 samples/sec Loss 7.8527 LearningRate 0.0567 Epoch: 9 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:38,574-Speed 10544.12 samples/sec Loss 7.9417 LearningRate 0.0567 Epoch: 9 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:39,548-Speed 10526.70 samples/sec Loss 7.8722 LearningRate 0.0567 Epoch: 9 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:40,520-Speed 10534.80 samples/sec Loss 7.9944 LearningRate 0.0567 Epoch: 9 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:41,437-Speed 11180.67 samples/sec Loss 7.9472 LearningRate 0.0567 Epoch: 9 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:42,372-Speed 10957.18 samples/sec Loss 7.8338 LearningRate 0.0567 Epoch: 9 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:43,313-Speed 10887.47 samples/sec Loss 7.9524 LearningRate 0.0567 Epoch: 9 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:44,256-Speed 10859.80 samples/sec Loss 8.0671 LearningRate 0.0567 Epoch: 9 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:45,149-Speed 11487.83 samples/sec Loss 7.9421 LearningRate 0.0567 Epoch: 9 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:46,053-Speed 11335.36 samples/sec Loss 8.1287 LearningRate 0.0566 Epoch: 9 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:46,949-Speed 11453.10 samples/sec Loss 7.9425 LearningRate 0.0566 Epoch: 9 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:47,895-Speed 10822.17 samples/sec Loss 8.1336 LearningRate 0.0566 Epoch: 9 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:48,817-Speed 11116.10 samples/sec Loss 7.9068 LearningRate 0.0566 Epoch: 9 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:49,735-Speed 11163.42 samples/sec Loss 7.8645 LearningRate 0.0566 Epoch: 9 Global Step: 56300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:50,667-Speed 10997.91 samples/sec Loss 7.9074 LearningRate 0.0566 Epoch: 9 Global Step: 56310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:51,582-Speed 11190.83 samples/sec Loss 7.8709 LearningRate 0.0566 Epoch: 9 Global Step: 56320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:52,518-Speed 10956.20 samples/sec Loss 7.8884 LearningRate 0.0566 Epoch: 9 Global Step: 56330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:53,458-Speed 10892.09 samples/sec Loss 8.1114 LearningRate 0.0566 Epoch: 9 Global Step: 56340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:54,363-Speed 11321.91 samples/sec Loss 7.9866 LearningRate 0.0566 Epoch: 9 Global Step: 56350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:55,253-Speed 11520.51 samples/sec Loss 7.9089 LearningRate 0.0566 Epoch: 9 Global Step: 56360 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:14:56,142-Speed 11526.89 samples/sec Loss 7.8743 LearningRate 0.0566 Epoch: 9 Global Step: 56370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:57,070-Speed 11045.18 samples/sec Loss 7.9162 LearningRate 0.0566 Epoch: 9 Global Step: 56380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:57,958-Speed 11542.10 samples/sec Loss 8.0328 LearningRate 0.0566 Epoch: 9 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:58,881-Speed 11098.39 samples/sec Loss 7.8225 LearningRate 0.0566 Epoch: 9 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:14:59,844-Speed 10637.38 samples/sec Loss 7.8974 LearningRate 0.0565 Epoch: 9 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:00,778-Speed 10978.81 samples/sec Loss 7.9565 LearningRate 0.0565 Epoch: 9 Global Step: 56420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:01,727-Speed 10801.92 samples/sec Loss 7.9249 LearningRate 0.0565 Epoch: 9 Global Step: 56430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:02,624-Speed 11419.33 samples/sec Loss 7.9498 LearningRate 0.0565 Epoch: 9 Global Step: 56440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:03,555-Speed 11006.81 samples/sec Loss 8.0497 LearningRate 0.0565 Epoch: 9 Global Step: 56450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:04,484-Speed 11032.54 samples/sec Loss 7.8802 LearningRate 0.0565 Epoch: 9 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:05,406-Speed 11118.00 samples/sec Loss 8.0184 LearningRate 0.0565 Epoch: 9 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:06,302-Speed 11437.25 samples/sec Loss 8.0660 LearningRate 0.0565 Epoch: 9 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:07,230-Speed 11033.25 samples/sec Loss 7.9227 LearningRate 0.0565 Epoch: 9 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:08,145-Speed 11212.01 samples/sec Loss 8.0292 LearningRate 0.0565 Epoch: 9 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:09,094-Speed 10788.70 samples/sec Loss 7.7446 LearningRate 0.0565 Epoch: 9 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:10,002-Speed 11293.60 samples/sec Loss 7.7874 LearningRate 0.0565 Epoch: 9 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:10,954-Speed 10763.67 samples/sec Loss 7.8430 LearningRate 0.0565 Epoch: 9 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:11,893-Speed 10902.49 samples/sec Loss 8.0996 LearningRate 0.0565 Epoch: 9 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:12,823-Speed 11021.31 samples/sec Loss 7.8773 LearningRate 0.0565 Epoch: 9 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:13,751-Speed 11044.16 samples/sec Loss 7.9218 LearningRate 0.0564 Epoch: 9 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:14,686-Speed 10955.28 samples/sec Loss 7.9114 LearningRate 0.0564 Epoch: 9 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:15,614-Speed 11043.09 samples/sec Loss 8.0677 LearningRate 0.0564 Epoch: 9 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:16,512-Speed 11413.72 samples/sec Loss 7.9172 LearningRate 0.0564 Epoch: 9 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:17,443-Speed 10998.72 samples/sec Loss 8.0901 LearningRate 0.0564 Epoch: 9 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:18,369-Speed 11065.36 samples/sec Loss 8.0372 LearningRate 0.0564 Epoch: 9 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:19,301-Speed 11000.78 samples/sec Loss 7.9959 LearningRate 0.0564 Epoch: 9 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:20,268-Speed 10601.97 samples/sec Loss 7.8988 LearningRate 0.0564 Epoch: 9 Global Step: 56630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:21,193-Speed 11072.18 samples/sec Loss 7.9502 LearningRate 0.0564 Epoch: 9 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:22,114-Speed 11131.58 samples/sec Loss 7.9010 LearningRate 0.0564 Epoch: 9 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:23,014-Speed 11385.24 samples/sec Loss 8.0032 LearningRate 0.0564 Epoch: 9 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:23,945-Speed 11006.43 samples/sec Loss 7.9258 LearningRate 0.0564 Epoch: 9 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:24,817-Speed 11752.63 samples/sec Loss 7.8515 LearningRate 0.0564 Epoch: 9 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:15:25,746-Speed 11028.13 samples/sec Loss 7.9248 LearningRate 0.0564 Epoch: 9 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:26,658-Speed 11237.99 samples/sec Loss 7.7993 LearningRate 0.0564 Epoch: 9 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:27,546-Speed 11541.38 samples/sec Loss 7.8817 LearningRate 0.0563 Epoch: 9 Global Step: 56710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:28,533-Speed 10384.66 samples/sec Loss 7.9987 LearningRate 0.0563 Epoch: 9 Global Step: 56720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:29,432-Speed 11393.60 samples/sec Loss 7.9574 LearningRate 0.0563 Epoch: 9 Global Step: 56730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:30,343-Speed 11253.59 samples/sec Loss 7.7722 LearningRate 0.0563 Epoch: 9 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:31,261-Speed 11159.54 samples/sec Loss 7.8735 LearningRate 0.0563 Epoch: 9 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:32,204-Speed 10872.21 samples/sec Loss 8.0699 LearningRate 0.0563 Epoch: 9 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:33,116-Speed 11237.42 samples/sec Loss 7.9476 LearningRate 0.0563 Epoch: 9 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:34,005-Speed 11525.25 samples/sec Loss 8.1142 LearningRate 0.0563 Epoch: 9 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:34,901-Speed 11446.99 samples/sec Loss 7.7323 LearningRate 0.0563 Epoch: 9 Global Step: 56790 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:15:35,803-Speed 11362.63 samples/sec Loss 8.0123 LearningRate 0.0563 Epoch: 9 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:36,761-Speed 10695.59 samples/sec Loss 7.9266 LearningRate 0.0563 Epoch: 9 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:37,685-Speed 11078.49 samples/sec Loss 8.0726 LearningRate 0.0563 Epoch: 9 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:38,602-Speed 11178.67 samples/sec Loss 7.9024 LearningRate 0.0563 Epoch: 9 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:39,542-Speed 10901.13 samples/sec Loss 7.9017 LearningRate 0.0563 Epoch: 9 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:40,531-Speed 10361.94 samples/sec Loss 7.7524 LearningRate 0.0563 Epoch: 9 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:41,387-Speed 11967.59 samples/sec Loss 7.8999 LearningRate 0.0563 Epoch: 9 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:51,794-Speed 984.06 samples/sec Loss 7.2816 LearningRate 0.0562 Epoch: 10 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:52,733-Speed 10918.92 samples/sec Loss 7.2104 LearningRate 0.0562 Epoch: 10 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:53,792-Speed 9681.98 samples/sec Loss 7.2142 LearningRate 0.0562 Epoch: 10 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:54,772-Speed 10454.52 samples/sec Loss 7.3157 LearningRate 0.0562 Epoch: 10 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:55,753-Speed 10443.71 samples/sec Loss 7.2567 LearningRate 0.0562 Epoch: 10 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:56,806-Speed 9735.43 samples/sec Loss 7.2982 LearningRate 0.0562 Epoch: 10 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:57,690-Speed 11586.20 samples/sec Loss 7.1514 LearningRate 0.0562 Epoch: 10 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:58,582-Speed 11492.30 samples/sec Loss 7.3309 LearningRate 0.0562 Epoch: 10 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:15:59,504-Speed 11114.57 samples/sec Loss 7.3588 LearningRate 0.0562 Epoch: 10 Global Step: 56950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:00,442-Speed 10917.21 samples/sec Loss 7.0988 LearningRate 0.0562 Epoch: 10 Global Step: 56960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:01,414-Speed 10685.45 samples/sec Loss 7.1655 LearningRate 0.0562 Epoch: 10 Global Step: 56970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:02,328-Speed 11208.14 samples/sec Loss 7.1327 LearningRate 0.0562 Epoch: 10 Global Step: 56980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:03,289-Speed 10662.08 samples/sec Loss 7.2655 LearningRate 0.0562 Epoch: 10 Global Step: 56990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:04,224-Speed 10955.72 samples/sec Loss 7.1758 LearningRate 0.0562 Epoch: 10 Global Step: 57000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:05,136-Speed 11237.12 samples/sec Loss 7.1926 LearningRate 0.0562 Epoch: 10 Global Step: 57010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:06,063-Speed 11057.63 samples/sec Loss 7.3737 LearningRate 0.0561 Epoch: 10 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:06,993-Speed 11011.60 samples/sec Loss 7.3146 LearningRate 0.0561 Epoch: 10 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:07,920-Speed 11062.96 samples/sec Loss 7.2904 LearningRate 0.0561 Epoch: 10 Global Step: 57040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:08,856-Speed 10952.39 samples/sec Loss 7.1543 LearningRate 0.0561 Epoch: 10 Global Step: 57050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:09,775-Speed 11145.00 samples/sec Loss 7.2540 LearningRate 0.0561 Epoch: 10 Global Step: 57060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:10,701-Speed 11066.05 samples/sec Loss 7.2820 LearningRate 0.0561 Epoch: 10 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:11,592-Speed 11510.56 samples/sec Loss 7.2701 LearningRate 0.0561 Epoch: 10 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:12,528-Speed 10942.16 samples/sec Loss 7.4164 LearningRate 0.0561 Epoch: 10 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:13,449-Speed 11123.25 samples/sec Loss 7.3904 LearningRate 0.0561 Epoch: 10 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:14,401-Speed 10766.43 samples/sec Loss 7.3926 LearningRate 0.0561 Epoch: 10 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:15,370-Speed 10581.56 samples/sec Loss 7.4153 LearningRate 0.0561 Epoch: 10 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:16,275-Speed 11322.90 samples/sec Loss 7.2586 LearningRate 0.0561 Epoch: 10 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:17,213-Speed 10918.72 samples/sec Loss 7.4059 LearningRate 0.0561 Epoch: 10 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:18,169-Speed 10719.64 samples/sec Loss 7.5337 LearningRate 0.0561 Epoch: 10 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:19,170-Speed 10231.37 samples/sec Loss 7.4750 LearningRate 0.0561 Epoch: 10 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:20,082-Speed 11246.06 samples/sec Loss 7.3379 LearningRate 0.0560 Epoch: 10 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:21,001-Speed 11147.46 samples/sec Loss 7.4648 LearningRate 0.0560 Epoch: 10 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:21,933-Speed 10990.50 samples/sec Loss 7.3694 LearningRate 0.0560 Epoch: 10 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:22,838-Speed 11323.88 samples/sec Loss 7.3566 LearningRate 0.0560 Epoch: 10 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:23,736-Speed 11415.04 samples/sec Loss 7.4302 LearningRate 0.0560 Epoch: 10 Global Step: 57210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:24,680-Speed 10858.53 samples/sec Loss 7.4902 LearningRate 0.0560 Epoch: 10 Global Step: 57220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:25,654-Speed 10519.88 samples/sec Loss 7.4958 LearningRate 0.0560 Epoch: 10 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:26,603-Speed 10797.38 samples/sec Loss 7.5142 LearningRate 0.0560 Epoch: 10 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:27,544-Speed 10889.32 samples/sec Loss 7.5088 LearningRate 0.0560 Epoch: 10 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:28,458-Speed 11202.64 samples/sec Loss 7.4895 LearningRate 0.0560 Epoch: 10 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:29,372-Speed 11221.76 samples/sec Loss 7.4616 LearningRate 0.0560 Epoch: 10 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:30,294-Speed 11107.26 samples/sec Loss 7.4554 LearningRate 0.0560 Epoch: 10 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:31,204-Speed 11261.64 samples/sec Loss 7.4897 LearningRate 0.0560 Epoch: 10 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:32,142-Speed 10932.45 samples/sec Loss 7.4742 LearningRate 0.0560 Epoch: 10 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:33,078-Speed 10949.20 samples/sec Loss 7.3801 LearningRate 0.0560 Epoch: 10 Global Step: 57310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:33,976-Speed 11404.38 samples/sec Loss 7.4087 LearningRate 0.0559 Epoch: 10 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:34,909-Speed 10978.95 samples/sec Loss 7.3356 LearningRate 0.0559 Epoch: 10 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:35,850-Speed 10894.88 samples/sec Loss 7.4783 LearningRate 0.0559 Epoch: 10 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:36,943-Speed 9368.08 samples/sec Loss 7.4877 LearningRate 0.0559 Epoch: 10 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:37,889-Speed 10833.24 samples/sec Loss 7.3095 LearningRate 0.0559 Epoch: 10 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:38,895-Speed 10194.25 samples/sec Loss 7.6388 LearningRate 0.0559 Epoch: 10 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:39,918-Speed 10013.18 samples/sec Loss 7.5759 LearningRate 0.0559 Epoch: 10 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:40,860-Speed 10873.85 samples/sec Loss 7.3911 LearningRate 0.0559 Epoch: 10 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:41,821-Speed 10664.97 samples/sec Loss 7.5805 LearningRate 0.0559 Epoch: 10 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:42,748-Speed 11057.76 samples/sec Loss 7.3807 LearningRate 0.0559 Epoch: 10 Global Step: 57410 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:16:43,686-Speed 10926.64 samples/sec Loss 7.6460 LearningRate 0.0559 Epoch: 10 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:44,621-Speed 10951.61 samples/sec Loss 7.6703 LearningRate 0.0559 Epoch: 10 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:45,522-Speed 11371.86 samples/sec Loss 7.4588 LearningRate 0.0559 Epoch: 10 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:46,464-Speed 10876.46 samples/sec Loss 7.5915 LearningRate 0.0559 Epoch: 10 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:47,421-Speed 10713.67 samples/sec Loss 7.6517 LearningRate 0.0559 Epoch: 10 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:48,360-Speed 10907.34 samples/sec Loss 7.6033 LearningRate 0.0558 Epoch: 10 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:49,310-Speed 10793.28 samples/sec Loss 7.6108 LearningRate 0.0558 Epoch: 10 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:16:50,233-Speed 11091.12 samples/sec Loss 7.5729 LearningRate 0.0558 Epoch: 10 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:51,139-Speed 11317.98 samples/sec Loss 7.5384 LearningRate 0.0558 Epoch: 10 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:52,083-Speed 10860.36 samples/sec Loss 7.5253 LearningRate 0.0558 Epoch: 10 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:53,001-Speed 11170.72 samples/sec Loss 7.6605 LearningRate 0.0558 Epoch: 10 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:53,963-Speed 10651.89 samples/sec Loss 7.4980 LearningRate 0.0558 Epoch: 10 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:54,891-Speed 11042.76 samples/sec Loss 7.6123 LearningRate 0.0558 Epoch: 10 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:55,798-Speed 11298.11 samples/sec Loss 7.5376 LearningRate 0.0558 Epoch: 10 Global Step: 57550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:56,716-Speed 11152.29 samples/sec Loss 7.5683 LearningRate 0.0558 Epoch: 10 Global Step: 57560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:57,637-Speed 11123.95 samples/sec Loss 7.6295 LearningRate 0.0558 Epoch: 10 Global Step: 57570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:58,572-Speed 10962.58 samples/sec Loss 7.5505 LearningRate 0.0558 Epoch: 10 Global Step: 57580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:16:59,541-Speed 10576.63 samples/sec Loss 7.5734 LearningRate 0.0558 Epoch: 10 Global Step: 57590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:00,499-Speed 10698.61 samples/sec Loss 7.4994 LearningRate 0.0558 Epoch: 10 Global Step: 57600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:01,438-Speed 10911.44 samples/sec Loss 7.5756 LearningRate 0.0558 Epoch: 10 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:02,396-Speed 10702.11 samples/sec Loss 7.4532 LearningRate 0.0557 Epoch: 10 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:03,350-Speed 10732.44 samples/sec Loss 7.6898 LearningRate 0.0557 Epoch: 10 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:04,303-Speed 10756.80 samples/sec Loss 7.6136 LearningRate 0.0557 Epoch: 10 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:05,232-Speed 11028.12 samples/sec Loss 7.4273 LearningRate 0.0557 Epoch: 10 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:06,166-Speed 10968.73 samples/sec Loss 7.5504 LearningRate 0.0557 Epoch: 10 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:07,109-Speed 10872.95 samples/sec Loss 7.6302 LearningRate 0.0557 Epoch: 10 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:08,050-Speed 10891.22 samples/sec Loss 7.6736 LearningRate 0.0557 Epoch: 10 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:09,004-Speed 10736.63 samples/sec Loss 7.6222 LearningRate 0.0557 Epoch: 10 Global Step: 57690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:09,953-Speed 10795.58 samples/sec Loss 7.5416 LearningRate 0.0557 Epoch: 10 Global Step: 57700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:10,938-Speed 10403.98 samples/sec Loss 7.8293 LearningRate 0.0557 Epoch: 10 Global Step: 57710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:11,866-Speed 11041.24 samples/sec Loss 7.7347 LearningRate 0.0557 Epoch: 10 Global Step: 57720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:12,783-Speed 11176.63 samples/sec Loss 7.5773 LearningRate 0.0557 Epoch: 10 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:13,729-Speed 10834.61 samples/sec Loss 7.6003 LearningRate 0.0557 Epoch: 10 Global Step: 57740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:14,676-Speed 10823.87 samples/sec Loss 7.5602 LearningRate 0.0557 Epoch: 10 Global Step: 57750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:15,615-Speed 10905.79 samples/sec Loss 7.8679 LearningRate 0.0557 Epoch: 10 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:16,571-Speed 10720.74 samples/sec Loss 7.7101 LearningRate 0.0557 Epoch: 10 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:17,496-Speed 11079.98 samples/sec Loss 7.7205 LearningRate 0.0556 Epoch: 10 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:18,404-Speed 11280.48 samples/sec Loss 7.7467 LearningRate 0.0556 Epoch: 10 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:19,351-Speed 10817.27 samples/sec Loss 7.6353 LearningRate 0.0556 Epoch: 10 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:20,315-Speed 10637.01 samples/sec Loss 7.6915 LearningRate 0.0556 Epoch: 10 Global Step: 57810 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:17:21,243-Speed 11041.19 samples/sec Loss 7.6440 LearningRate 0.0556 Epoch: 10 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:22,156-Speed 11226.71 samples/sec Loss 7.7665 LearningRate 0.0556 Epoch: 10 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:23,125-Speed 10570.45 samples/sec Loss 7.7366 LearningRate 0.0556 Epoch: 10 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:24,113-Speed 10366.34 samples/sec Loss 7.5783 LearningRate 0.0556 Epoch: 10 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:25,050-Speed 10932.46 samples/sec Loss 7.6096 LearningRate 0.0556 Epoch: 10 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:25,988-Speed 10932.47 samples/sec Loss 7.6948 LearningRate 0.0556 Epoch: 10 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:26,950-Speed 10649.56 samples/sec Loss 7.6810 LearningRate 0.0556 Epoch: 10 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:27,882-Speed 10994.36 samples/sec Loss 7.6930 LearningRate 0.0556 Epoch: 10 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:28,848-Speed 10603.70 samples/sec Loss 7.7137 LearningRate 0.0556 Epoch: 10 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:29,821-Speed 10536.29 samples/sec Loss 7.5657 LearningRate 0.0556 Epoch: 10 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:30,760-Speed 10908.99 samples/sec Loss 7.8078 LearningRate 0.0556 Epoch: 10 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:31,696-Speed 10945.54 samples/sec Loss 7.7459 LearningRate 0.0555 Epoch: 10 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:32,646-Speed 10795.04 samples/sec Loss 7.6582 LearningRate 0.0555 Epoch: 10 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:33,616-Speed 10560.95 samples/sec Loss 7.5504 LearningRate 0.0555 Epoch: 10 Global Step: 57950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:34,533-Speed 11167.42 samples/sec Loss 7.7426 LearningRate 0.0555 Epoch: 10 Global Step: 57960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:17:35,466-Speed 10995.48 samples/sec Loss 7.5902 LearningRate 0.0555 Epoch: 10 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:36,380-Speed 11209.33 samples/sec Loss 7.6998 LearningRate 0.0555 Epoch: 10 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:37,353-Speed 10524.50 samples/sec Loss 7.5644 LearningRate 0.0555 Epoch: 10 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:38,263-Speed 11263.49 samples/sec Loss 7.6443 LearningRate 0.0555 Epoch: 10 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:17:59,703-[lfw][58000]XNorm: 12.777583 Training: 2022-04-11 01:17:59,703-[lfw][58000]Accuracy-Flip: 0.99317+-0.00398 Training: 2022-04-11 01:17:59,703-[lfw][58000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:18:24,236-[cfp_fp][58000]XNorm: 10.760978 Training: 2022-04-11 01:18:24,236-[cfp_fp][58000]Accuracy-Flip: 0.93671+-0.01328 Training: 2022-04-11 01:18:24,237-[cfp_fp][58000]Accuracy-Highest: 0.93786 Training: 2022-04-11 01:18:45,330-[agedb_30][58000]XNorm: 12.556809 Training: 2022-04-11 01:18:45,330-[agedb_30][58000]Accuracy-Flip: 0.96067+-0.00847 Training: 2022-04-11 01:18:45,331-[agedb_30][58000]Accuracy-Highest: 0.96067 Training: 2022-04-11 01:18:46,246-Speed 150.63 samples/sec Loss 7.6919 LearningRate 0.0555 Epoch: 10 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:47,166-Speed 11134.97 samples/sec Loss 7.6577 LearningRate 0.0555 Epoch: 10 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:48,116-Speed 10787.03 samples/sec Loss 7.6749 LearningRate 0.0555 Epoch: 10 Global Step: 58030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:49,022-Speed 11305.98 samples/sec Loss 7.6235 LearningRate 0.0555 Epoch: 10 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:49,916-Speed 11456.85 samples/sec Loss 7.7044 LearningRate 0.0555 Epoch: 10 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:50,818-Speed 11364.76 samples/sec Loss 7.5179 LearningRate 0.0555 Epoch: 10 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:51,779-Speed 10664.32 samples/sec Loss 7.5344 LearningRate 0.0555 Epoch: 10 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:52,714-Speed 10957.34 samples/sec Loss 7.7241 LearningRate 0.0554 Epoch: 10 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:18:53,628-Speed 11213.48 samples/sec Loss 7.6858 LearningRate 0.0554 Epoch: 10 Global Step: 58090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:18:54,558-Speed 11016.78 samples/sec Loss 7.5513 LearningRate 0.0554 Epoch: 10 Global Step: 58100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:18:55,486-Speed 11042.23 samples/sec Loss 7.7409 LearningRate 0.0554 Epoch: 10 Global Step: 58110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:18:56,409-Speed 11103.51 samples/sec Loss 7.6188 LearningRate 0.0554 Epoch: 10 Global Step: 58120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:18:57,309-Speed 11384.84 samples/sec Loss 7.6270 LearningRate 0.0554 Epoch: 10 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:18:58,282-Speed 10530.53 samples/sec Loss 7.7865 LearningRate 0.0554 Epoch: 10 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:18:59,231-Speed 10801.29 samples/sec Loss 7.6359 LearningRate 0.0554 Epoch: 10 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:00,158-Speed 11053.02 samples/sec Loss 7.4840 LearningRate 0.0554 Epoch: 10 Global Step: 58160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:01,089-Speed 10998.56 samples/sec Loss 7.6376 LearningRate 0.0554 Epoch: 10 Global Step: 58170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:02,060-Speed 10549.90 samples/sec Loss 7.7929 LearningRate 0.0554 Epoch: 10 Global Step: 58180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:02,980-Speed 11140.26 samples/sec Loss 7.5875 LearningRate 0.0554 Epoch: 10 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:03,896-Speed 11189.21 samples/sec Loss 7.6553 LearningRate 0.0554 Epoch: 10 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:04,841-Speed 10845.65 samples/sec Loss 7.8582 LearningRate 0.0554 Epoch: 10 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:05,784-Speed 10862.63 samples/sec Loss 7.7892 LearningRate 0.0554 Epoch: 10 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:06,751-Speed 10603.97 samples/sec Loss 7.7011 LearningRate 0.0554 Epoch: 10 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:07,680-Speed 11026.67 samples/sec Loss 7.6382 LearningRate 0.0553 Epoch: 10 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:08,609-Speed 11035.26 samples/sec Loss 7.8132 LearningRate 0.0553 Epoch: 10 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:09,586-Speed 10478.81 samples/sec Loss 7.8324 LearningRate 0.0553 Epoch: 10 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:10,508-Speed 11122.40 samples/sec Loss 7.8558 LearningRate 0.0553 Epoch: 10 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:11,479-Speed 10697.33 samples/sec Loss 7.6136 LearningRate 0.0553 Epoch: 10 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:12,430-Speed 10771.82 samples/sec Loss 7.7419 LearningRate 0.0553 Epoch: 10 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:13,397-Speed 10607.62 samples/sec Loss 7.7306 LearningRate 0.0553 Epoch: 10 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:14,346-Speed 10800.98 samples/sec Loss 7.7599 LearningRate 0.0553 Epoch: 10 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:15,271-Speed 11072.07 samples/sec Loss 7.7536 LearningRate 0.0553 Epoch: 10 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:16,232-Speed 10665.52 samples/sec Loss 7.7943 LearningRate 0.0553 Epoch: 10 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:17,165-Speed 10985.13 samples/sec Loss 7.8031 LearningRate 0.0553 Epoch: 10 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:18,073-Speed 11280.27 samples/sec Loss 7.8951 LearningRate 0.0553 Epoch: 10 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:18,999-Speed 11065.66 samples/sec Loss 7.8307 LearningRate 0.0553 Epoch: 10 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:19,977-Speed 10476.59 samples/sec Loss 7.7851 LearningRate 0.0553 Epoch: 10 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:20,901-Speed 11093.86 samples/sec Loss 7.8825 LearningRate 0.0553 Epoch: 10 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:21,839-Speed 10918.24 samples/sec Loss 7.6648 LearningRate 0.0552 Epoch: 10 Global Step: 58390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:22,781-Speed 10886.56 samples/sec Loss 7.8098 LearningRate 0.0552 Epoch: 10 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:23,731-Speed 10780.95 samples/sec Loss 7.6875 LearningRate 0.0552 Epoch: 10 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:24,691-Speed 10669.42 samples/sec Loss 7.6986 LearningRate 0.0552 Epoch: 10 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:25,633-Speed 10883.33 samples/sec Loss 7.7246 LearningRate 0.0552 Epoch: 10 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:26,592-Speed 10684.70 samples/sec Loss 7.5987 LearningRate 0.0552 Epoch: 10 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:27,566-Speed 10523.74 samples/sec Loss 7.6965 LearningRate 0.0552 Epoch: 10 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:28,516-Speed 10781.85 samples/sec Loss 7.7391 LearningRate 0.0552 Epoch: 10 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:29,477-Speed 10662.55 samples/sec Loss 7.6070 LearningRate 0.0552 Epoch: 10 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:30,407-Speed 11015.43 samples/sec Loss 7.7629 LearningRate 0.0552 Epoch: 10 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:31,311-Speed 11334.90 samples/sec Loss 7.7024 LearningRate 0.0552 Epoch: 10 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:32,246-Speed 10964.58 samples/sec Loss 7.7549 LearningRate 0.0552 Epoch: 10 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:33,164-Speed 11162.73 samples/sec Loss 7.8200 LearningRate 0.0552 Epoch: 10 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:34,106-Speed 10875.62 samples/sec Loss 7.6896 LearningRate 0.0552 Epoch: 10 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:35,038-Speed 11002.65 samples/sec Loss 7.7073 LearningRate 0.0552 Epoch: 10 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:35,955-Speed 11176.60 samples/sec Loss 7.7248 LearningRate 0.0551 Epoch: 10 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:36,891-Speed 10943.83 samples/sec Loss 7.6442 LearningRate 0.0551 Epoch: 10 Global Step: 58550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:37,827-Speed 10946.02 samples/sec Loss 7.7856 LearningRate 0.0551 Epoch: 10 Global Step: 58560 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:19:38,774-Speed 10822.73 samples/sec Loss 7.6674 LearningRate 0.0551 Epoch: 10 Global Step: 58570 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:19:39,730-Speed 10716.90 samples/sec Loss 7.7078 LearningRate 0.0551 Epoch: 10 Global Step: 58580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:40,730-Speed 10249.36 samples/sec Loss 7.8017 LearningRate 0.0551 Epoch: 10 Global Step: 58590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:41,665-Speed 10960.88 samples/sec Loss 7.6517 LearningRate 0.0551 Epoch: 10 Global Step: 58600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:42,600-Speed 10960.35 samples/sec Loss 7.8946 LearningRate 0.0551 Epoch: 10 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:43,571-Speed 10547.33 samples/sec Loss 7.7458 LearningRate 0.0551 Epoch: 10 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:44,537-Speed 10615.38 samples/sec Loss 7.5572 LearningRate 0.0551 Epoch: 10 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:45,486-Speed 10794.00 samples/sec Loss 7.6932 LearningRate 0.0551 Epoch: 10 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:46,430-Speed 10848.93 samples/sec Loss 7.6962 LearningRate 0.0551 Epoch: 10 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:47,362-Speed 10996.95 samples/sec Loss 7.7058 LearningRate 0.0551 Epoch: 10 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:48,296-Speed 10966.06 samples/sec Loss 7.7171 LearningRate 0.0551 Epoch: 10 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:49,230-Speed 10973.40 samples/sec Loss 7.7200 LearningRate 0.0551 Epoch: 10 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:50,161-Speed 11009.87 samples/sec Loss 7.6936 LearningRate 0.0550 Epoch: 10 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:51,089-Speed 11043.28 samples/sec Loss 7.8370 LearningRate 0.0550 Epoch: 10 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:19:52,057-Speed 10589.68 samples/sec Loss 7.7376 LearningRate 0.0550 Epoch: 10 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:53,032-Speed 10507.21 samples/sec Loss 7.8436 LearningRate 0.0550 Epoch: 10 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:53,970-Speed 10917.31 samples/sec Loss 7.7176 LearningRate 0.0550 Epoch: 10 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:54,900-Speed 11023.93 samples/sec Loss 7.7385 LearningRate 0.0550 Epoch: 10 Global Step: 58740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:55,851-Speed 10773.99 samples/sec Loss 7.8822 LearningRate 0.0550 Epoch: 10 Global Step: 58750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:56,766-Speed 11203.92 samples/sec Loss 7.7396 LearningRate 0.0550 Epoch: 10 Global Step: 58760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:57,649-Speed 11603.28 samples/sec Loss 7.8154 LearningRate 0.0550 Epoch: 10 Global Step: 58770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:58,587-Speed 10927.78 samples/sec Loss 7.7162 LearningRate 0.0550 Epoch: 10 Global Step: 58780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:19:59,527-Speed 10903.60 samples/sec Loss 7.6363 LearningRate 0.0550 Epoch: 10 Global Step: 58790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:00,459-Speed 10992.74 samples/sec Loss 7.8163 LearningRate 0.0550 Epoch: 10 Global Step: 58800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:01,386-Speed 11051.09 samples/sec Loss 7.7597 LearningRate 0.0550 Epoch: 10 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:02,316-Speed 11015.82 samples/sec Loss 7.7288 LearningRate 0.0550 Epoch: 10 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:03,255-Speed 10912.21 samples/sec Loss 7.8597 LearningRate 0.0550 Epoch: 10 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:04,178-Speed 11103.76 samples/sec Loss 7.8457 LearningRate 0.0550 Epoch: 10 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:05,118-Speed 10895.43 samples/sec Loss 7.8141 LearningRate 0.0549 Epoch: 10 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:06,046-Speed 11046.99 samples/sec Loss 7.8311 LearningRate 0.0549 Epoch: 10 Global Step: 58860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:06,969-Speed 11103.43 samples/sec Loss 7.6094 LearningRate 0.0549 Epoch: 10 Global Step: 58870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:07,929-Speed 10676.87 samples/sec Loss 7.6918 LearningRate 0.0549 Epoch: 10 Global Step: 58880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:08,869-Speed 10895.31 samples/sec Loss 7.9296 LearningRate 0.0549 Epoch: 10 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:09,768-Speed 11396.06 samples/sec Loss 7.7752 LearningRate 0.0549 Epoch: 10 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:10,691-Speed 11106.40 samples/sec Loss 7.8859 LearningRate 0.0549 Epoch: 10 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:11,603-Speed 11229.09 samples/sec Loss 7.6392 LearningRate 0.0549 Epoch: 10 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:12,567-Speed 10628.58 samples/sec Loss 7.7730 LearningRate 0.0549 Epoch: 10 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:13,538-Speed 10564.41 samples/sec Loss 7.8174 LearningRate 0.0549 Epoch: 10 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:14,498-Speed 10668.39 samples/sec Loss 7.6776 LearningRate 0.0549 Epoch: 10 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:15,431-Speed 10981.77 samples/sec Loss 7.6981 LearningRate 0.0549 Epoch: 10 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:16,385-Speed 10745.59 samples/sec Loss 7.5807 LearningRate 0.0549 Epoch: 10 Global Step: 58970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:17,297-Speed 11235.88 samples/sec Loss 7.7160 LearningRate 0.0549 Epoch: 10 Global Step: 58980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:18,246-Speed 10792.24 samples/sec Loss 7.6910 LearningRate 0.0549 Epoch: 10 Global Step: 58990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:19,158-Speed 11246.90 samples/sec Loss 7.7655 LearningRate 0.0548 Epoch: 10 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:20,094-Speed 10942.80 samples/sec Loss 7.5857 LearningRate 0.0548 Epoch: 10 Global Step: 59010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:21,020-Speed 11061.97 samples/sec Loss 7.6960 LearningRate 0.0548 Epoch: 10 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:21,933-Speed 11222.94 samples/sec Loss 7.7054 LearningRate 0.0548 Epoch: 10 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:22,843-Speed 11260.25 samples/sec Loss 7.7212 LearningRate 0.0548 Epoch: 10 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:23,776-Speed 10990.05 samples/sec Loss 7.6168 LearningRate 0.0548 Epoch: 10 Global Step: 59050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:24,749-Speed 10527.88 samples/sec Loss 7.7719 LearningRate 0.0548 Epoch: 10 Global Step: 59060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:25,686-Speed 10939.84 samples/sec Loss 7.7836 LearningRate 0.0548 Epoch: 10 Global Step: 59070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:26,605-Speed 11145.91 samples/sec Loss 7.8254 LearningRate 0.0548 Epoch: 10 Global Step: 59080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:27,589-Speed 10410.18 samples/sec Loss 7.8401 LearningRate 0.0548 Epoch: 10 Global Step: 59090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:28,550-Speed 10668.32 samples/sec Loss 7.7203 LearningRate 0.0548 Epoch: 10 Global Step: 59100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:29,479-Speed 11027.86 samples/sec Loss 7.8683 LearningRate 0.0548 Epoch: 10 Global Step: 59110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:30,396-Speed 11166.95 samples/sec Loss 7.8670 LearningRate 0.0548 Epoch: 10 Global Step: 59120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:31,320-Speed 11099.24 samples/sec Loss 7.7368 LearningRate 0.0548 Epoch: 10 Global Step: 59130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:32,263-Speed 10866.19 samples/sec Loss 7.7137 LearningRate 0.0548 Epoch: 10 Global Step: 59140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:33,163-Speed 11383.24 samples/sec Loss 7.7689 LearningRate 0.0547 Epoch: 10 Global Step: 59150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:34,067-Speed 11330.63 samples/sec Loss 7.9707 LearningRate 0.0547 Epoch: 10 Global Step: 59160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:34,958-Speed 11498.73 samples/sec Loss 7.8110 LearningRate 0.0547 Epoch: 10 Global Step: 59170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:35,888-Speed 11024.34 samples/sec Loss 7.5873 LearningRate 0.0547 Epoch: 10 Global Step: 59180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:36,839-Speed 10767.69 samples/sec Loss 7.7583 LearningRate 0.0547 Epoch: 10 Global Step: 59190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:37,843-Speed 10210.19 samples/sec Loss 7.7542 LearningRate 0.0547 Epoch: 10 Global Step: 59200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:38,790-Speed 10825.47 samples/sec Loss 7.7560 LearningRate 0.0547 Epoch: 10 Global Step: 59210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:20:39,744-Speed 10732.18 samples/sec Loss 7.8024 LearningRate 0.0547 Epoch: 10 Global Step: 59220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:40,671-Speed 11058.97 samples/sec Loss 7.8431 LearningRate 0.0547 Epoch: 10 Global Step: 59230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:41,594-Speed 11112.23 samples/sec Loss 7.7724 LearningRate 0.0547 Epoch: 10 Global Step: 59240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:42,524-Speed 11009.46 samples/sec Loss 7.8045 LearningRate 0.0547 Epoch: 10 Global Step: 59250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:43,493-Speed 10585.67 samples/sec Loss 7.7526 LearningRate 0.0547 Epoch: 10 Global Step: 59260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:44,413-Speed 11136.51 samples/sec Loss 7.7404 LearningRate 0.0547 Epoch: 10 Global Step: 59270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:45,318-Speed 11315.73 samples/sec Loss 7.8861 LearningRate 0.0547 Epoch: 10 Global Step: 59280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:46,209-Speed 11498.50 samples/sec Loss 7.6620 LearningRate 0.0547 Epoch: 10 Global Step: 59290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:47,144-Speed 10968.37 samples/sec Loss 7.8242 LearningRate 0.0547 Epoch: 10 Global Step: 59300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:48,100-Speed 10718.79 samples/sec Loss 7.9354 LearningRate 0.0546 Epoch: 10 Global Step: 59310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:49,042-Speed 10867.41 samples/sec Loss 7.7125 LearningRate 0.0546 Epoch: 10 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:49,993-Speed 10777.02 samples/sec Loss 7.7601 LearningRate 0.0546 Epoch: 10 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:50,932-Speed 10916.35 samples/sec Loss 7.7407 LearningRate 0.0546 Epoch: 10 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:51,839-Speed 11295.61 samples/sec Loss 7.7446 LearningRate 0.0546 Epoch: 10 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:52,813-Speed 10522.70 samples/sec Loss 7.8407 LearningRate 0.0546 Epoch: 10 Global Step: 59360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:53,757-Speed 10853.40 samples/sec Loss 7.7212 LearningRate 0.0546 Epoch: 10 Global Step: 59370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:54,690-Speed 10982.80 samples/sec Loss 7.8952 LearningRate 0.0546 Epoch: 10 Global Step: 59380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:55,617-Speed 11055.28 samples/sec Loss 7.7711 LearningRate 0.0546 Epoch: 10 Global Step: 59390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:56,527-Speed 11256.86 samples/sec Loss 7.6007 LearningRate 0.0546 Epoch: 10 Global Step: 59400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:20:57,427-Speed 11396.84 samples/sec Loss 7.7246 LearningRate 0.0546 Epoch: 10 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:58,375-Speed 10799.78 samples/sec Loss 7.7571 LearningRate 0.0546 Epoch: 10 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:20:59,301-Speed 11068.32 samples/sec Loss 7.7401 LearningRate 0.0546 Epoch: 10 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:00,225-Speed 11092.33 samples/sec Loss 7.7102 LearningRate 0.0546 Epoch: 10 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:01,173-Speed 10806.13 samples/sec Loss 7.6076 LearningRate 0.0546 Epoch: 10 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:02,094-Speed 11125.03 samples/sec Loss 7.7473 LearningRate 0.0545 Epoch: 10 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:03,043-Speed 10805.84 samples/sec Loss 7.7532 LearningRate 0.0545 Epoch: 10 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:03,968-Speed 11075.12 samples/sec Loss 7.7780 LearningRate 0.0545 Epoch: 10 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:04,886-Speed 11161.60 samples/sec Loss 7.8099 LearningRate 0.0545 Epoch: 10 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:05,843-Speed 10703.06 samples/sec Loss 7.6453 LearningRate 0.0545 Epoch: 10 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:06,787-Speed 10855.21 samples/sec Loss 7.6312 LearningRate 0.0545 Epoch: 10 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:07,710-Speed 11110.14 samples/sec Loss 7.7831 LearningRate 0.0545 Epoch: 10 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:08,608-Speed 11401.82 samples/sec Loss 7.7198 LearningRate 0.0545 Epoch: 10 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:09,519-Speed 11244.75 samples/sec Loss 7.8270 LearningRate 0.0545 Epoch: 10 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:10,406-Speed 11559.04 samples/sec Loss 7.7120 LearningRate 0.0545 Epoch: 10 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:11,309-Speed 11346.65 samples/sec Loss 7.8266 LearningRate 0.0545 Epoch: 10 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:12,263-Speed 10741.43 samples/sec Loss 7.8611 LearningRate 0.0545 Epoch: 10 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:13,208-Speed 10847.16 samples/sec Loss 7.5979 LearningRate 0.0545 Epoch: 10 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:14,194-Speed 10391.99 samples/sec Loss 7.7089 LearningRate 0.0545 Epoch: 10 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:15,138-Speed 10852.76 samples/sec Loss 7.7384 LearningRate 0.0545 Epoch: 10 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:16,071-Speed 10984.45 samples/sec Loss 7.7842 LearningRate 0.0545 Epoch: 10 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:17,010-Speed 10907.70 samples/sec Loss 7.6874 LearningRate 0.0544 Epoch: 10 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:17,948-Speed 10927.58 samples/sec Loss 7.7689 LearningRate 0.0544 Epoch: 10 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:18,899-Speed 10768.82 samples/sec Loss 7.7767 LearningRate 0.0544 Epoch: 10 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:19,837-Speed 10930.03 samples/sec Loss 7.8081 LearningRate 0.0544 Epoch: 10 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:20,776-Speed 10914.37 samples/sec Loss 7.6707 LearningRate 0.0544 Epoch: 10 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:21,694-Speed 11154.98 samples/sec Loss 7.6900 LearningRate 0.0544 Epoch: 10 Global Step: 59670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:22,626-Speed 11004.83 samples/sec Loss 7.8368 LearningRate 0.0544 Epoch: 10 Global Step: 59680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:23,589-Speed 10639.78 samples/sec Loss 7.7701 LearningRate 0.0544 Epoch: 10 Global Step: 59690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:24,527-Speed 10919.24 samples/sec Loss 7.7596 LearningRate 0.0544 Epoch: 10 Global Step: 59700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:25,458-Speed 11008.18 samples/sec Loss 7.8972 LearningRate 0.0544 Epoch: 10 Global Step: 59710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:26,374-Speed 11185.78 samples/sec Loss 7.7258 LearningRate 0.0544 Epoch: 10 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:27,339-Speed 10616.98 samples/sec Loss 7.6832 LearningRate 0.0544 Epoch: 10 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:28,274-Speed 10953.24 samples/sec Loss 7.7284 LearningRate 0.0544 Epoch: 10 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:29,182-Speed 11286.67 samples/sec Loss 7.8306 LearningRate 0.0544 Epoch: 10 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:30,098-Speed 11190.61 samples/sec Loss 7.7182 LearningRate 0.0544 Epoch: 10 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:21:31,009-Speed 11248.98 samples/sec Loss 7.7727 LearningRate 0.0543 Epoch: 10 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:31,934-Speed 11079.23 samples/sec Loss 7.7229 LearningRate 0.0543 Epoch: 10 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:32,842-Speed 11281.37 samples/sec Loss 7.8870 LearningRate 0.0543 Epoch: 10 Global Step: 59790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:33,751-Speed 11279.80 samples/sec Loss 7.7770 LearningRate 0.0543 Epoch: 10 Global Step: 59800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:34,693-Speed 10875.04 samples/sec Loss 7.8692 LearningRate 0.0543 Epoch: 10 Global Step: 59810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:35,667-Speed 10520.60 samples/sec Loss 7.7226 LearningRate 0.0543 Epoch: 10 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:36,604-Speed 10932.09 samples/sec Loss 7.8234 LearningRate 0.0543 Epoch: 10 Global Step: 59830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:37,535-Speed 11004.13 samples/sec Loss 7.6798 LearningRate 0.0543 Epoch: 10 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:38,493-Speed 10697.14 samples/sec Loss 7.6965 LearningRate 0.0543 Epoch: 10 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:39,441-Speed 10810.33 samples/sec Loss 7.8338 LearningRate 0.0543 Epoch: 10 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:40,395-Speed 10741.22 samples/sec Loss 7.7677 LearningRate 0.0543 Epoch: 10 Global Step: 59870 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:21:41,306-Speed 11245.66 samples/sec Loss 7.5709 LearningRate 0.0543 Epoch: 10 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:42,223-Speed 11176.74 samples/sec Loss 7.7087 LearningRate 0.0543 Epoch: 10 Global Step: 59890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:43,152-Speed 11031.70 samples/sec Loss 7.7262 LearningRate 0.0543 Epoch: 10 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:44,099-Speed 10821.88 samples/sec Loss 7.7682 LearningRate 0.0543 Epoch: 10 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:45,040-Speed 10883.76 samples/sec Loss 7.9237 LearningRate 0.0542 Epoch: 10 Global Step: 59920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:46,006-Speed 10613.33 samples/sec Loss 7.7612 LearningRate 0.0542 Epoch: 10 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:46,964-Speed 10696.90 samples/sec Loss 7.7672 LearningRate 0.0542 Epoch: 10 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:47,914-Speed 10789.09 samples/sec Loss 7.7937 LearningRate 0.0542 Epoch: 10 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:48,871-Speed 10704.14 samples/sec Loss 7.9455 LearningRate 0.0542 Epoch: 10 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:49,772-Speed 11371.84 samples/sec Loss 7.7605 LearningRate 0.0542 Epoch: 10 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:50,718-Speed 10838.41 samples/sec Loss 7.6890 LearningRate 0.0542 Epoch: 10 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:51,614-Speed 11431.52 samples/sec Loss 7.8424 LearningRate 0.0542 Epoch: 10 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:21:52,538-Speed 11089.91 samples/sec Loss 7.7941 LearningRate 0.0542 Epoch: 10 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:22:13,846-[lfw][60000]XNorm: 12.869140 Training: 2022-04-11 01:22:13,847-[lfw][60000]Accuracy-Flip: 0.99467+-0.00393 Training: 2022-04-11 01:22:13,847-[lfw][60000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:22:38,494-[cfp_fp][60000]XNorm: 10.677983 Training: 2022-04-11 01:22:38,494-[cfp_fp][60000]Accuracy-Flip: 0.93586+-0.01366 Training: 2022-04-11 01:22:38,494-[cfp_fp][60000]Accuracy-Highest: 0.93786 Training: 2022-04-11 01:22:59,840-[agedb_30][60000]XNorm: 12.566534 Training: 2022-04-11 01:22:59,840-[agedb_30][60000]Accuracy-Flip: 0.95750+-0.00967 Training: 2022-04-11 01:22:59,841-[agedb_30][60000]Accuracy-Highest: 0.96067 Training: 2022-04-11 01:23:00,772-Speed 150.07 samples/sec Loss 7.6707 LearningRate 0.0542 Epoch: 10 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:01,701-Speed 11030.63 samples/sec Loss 7.8306 LearningRate 0.0542 Epoch: 10 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:02,679-Speed 10480.01 samples/sec Loss 7.7018 LearningRate 0.0542 Epoch: 10 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:03,647-Speed 10582.42 samples/sec Loss 7.5769 LearningRate 0.0542 Epoch: 10 Global Step: 60040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:04,573-Speed 11068.78 samples/sec Loss 7.6333 LearningRate 0.0542 Epoch: 10 Global Step: 60050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:05,506-Speed 10980.15 samples/sec Loss 7.8129 LearningRate 0.0542 Epoch: 10 Global Step: 60060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:06,439-Speed 10987.37 samples/sec Loss 7.7444 LearningRate 0.0542 Epoch: 10 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:07,359-Speed 11130.96 samples/sec Loss 7.6741 LearningRate 0.0541 Epoch: 10 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:08,305-Speed 10835.50 samples/sec Loss 7.7207 LearningRate 0.0541 Epoch: 10 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:09,237-Speed 10996.14 samples/sec Loss 7.9089 LearningRate 0.0541 Epoch: 10 Global Step: 60100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:10,176-Speed 10907.63 samples/sec Loss 7.7423 LearningRate 0.0541 Epoch: 10 Global Step: 60110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:11,123-Speed 10826.01 samples/sec Loss 7.7068 LearningRate 0.0541 Epoch: 10 Global Step: 60120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:12,073-Speed 10787.95 samples/sec Loss 7.7879 LearningRate 0.0541 Epoch: 10 Global Step: 60130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:12,967-Speed 11461.71 samples/sec Loss 7.7874 LearningRate 0.0541 Epoch: 10 Global Step: 60140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:13,884-Speed 11164.65 samples/sec Loss 7.6098 LearningRate 0.0541 Epoch: 10 Global Step: 60150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:14,790-Speed 11317.18 samples/sec Loss 7.9478 LearningRate 0.0541 Epoch: 10 Global Step: 60160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:15,695-Speed 11321.09 samples/sec Loss 7.7344 LearningRate 0.0541 Epoch: 10 Global Step: 60170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:16,609-Speed 11210.61 samples/sec Loss 7.8030 LearningRate 0.0541 Epoch: 10 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:17,558-Speed 10797.01 samples/sec Loss 7.9031 LearningRate 0.0541 Epoch: 10 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:18,533-Speed 10515.70 samples/sec Loss 7.8105 LearningRate 0.0541 Epoch: 10 Global Step: 60200 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:23:19,518-Speed 10404.10 samples/sec Loss 7.9054 LearningRate 0.0541 Epoch: 10 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:20,456-Speed 10922.21 samples/sec Loss 7.8479 LearningRate 0.0541 Epoch: 10 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:21,360-Speed 11325.48 samples/sec Loss 7.7404 LearningRate 0.0540 Epoch: 10 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:22,293-Speed 10987.44 samples/sec Loss 7.7404 LearningRate 0.0540 Epoch: 10 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:23,241-Speed 10811.34 samples/sec Loss 7.7998 LearningRate 0.0540 Epoch: 10 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:24,197-Speed 10714.01 samples/sec Loss 7.7242 LearningRate 0.0540 Epoch: 10 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:25,145-Speed 10817.37 samples/sec Loss 7.7678 LearningRate 0.0540 Epoch: 10 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:26,044-Speed 11387.61 samples/sec Loss 7.6635 LearningRate 0.0540 Epoch: 10 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:26,984-Speed 10909.74 samples/sec Loss 7.8526 LearningRate 0.0540 Epoch: 10 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:27,914-Speed 11010.40 samples/sec Loss 7.9523 LearningRate 0.0540 Epoch: 10 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:28,895-Speed 10448.07 samples/sec Loss 7.5931 LearningRate 0.0540 Epoch: 10 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:29,830-Speed 10962.01 samples/sec Loss 7.6636 LearningRate 0.0540 Epoch: 10 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:30,799-Speed 10577.43 samples/sec Loss 7.7723 LearningRate 0.0540 Epoch: 10 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:31,725-Speed 11061.51 samples/sec Loss 7.7569 LearningRate 0.0540 Epoch: 10 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:32,615-Speed 11519.71 samples/sec Loss 7.7816 LearningRate 0.0540 Epoch: 10 Global Step: 60350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:33,538-Speed 11093.96 samples/sec Loss 7.8607 LearningRate 0.0540 Epoch: 10 Global Step: 60360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:34,458-Speed 11144.08 samples/sec Loss 7.7579 LearningRate 0.0540 Epoch: 10 Global Step: 60370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:35,407-Speed 10796.05 samples/sec Loss 7.7294 LearningRate 0.0540 Epoch: 10 Global Step: 60380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:36,342-Speed 10951.44 samples/sec Loss 7.7278 LearningRate 0.0539 Epoch: 10 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:37,272-Speed 11017.52 samples/sec Loss 7.6565 LearningRate 0.0539 Epoch: 10 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:38,232-Speed 10677.91 samples/sec Loss 7.6524 LearningRate 0.0539 Epoch: 10 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:39,183-Speed 10774.91 samples/sec Loss 7.8389 LearningRate 0.0539 Epoch: 10 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:40,127-Speed 10854.63 samples/sec Loss 7.6507 LearningRate 0.0539 Epoch: 10 Global Step: 60430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:41,070-Speed 10861.26 samples/sec Loss 7.5760 LearningRate 0.0539 Epoch: 10 Global Step: 60440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:42,014-Speed 10859.03 samples/sec Loss 7.8780 LearningRate 0.0539 Epoch: 10 Global Step: 60450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:42,943-Speed 11028.41 samples/sec Loss 7.6883 LearningRate 0.0539 Epoch: 10 Global Step: 60460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:43,874-Speed 11006.79 samples/sec Loss 7.9155 LearningRate 0.0539 Epoch: 10 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:44,807-Speed 10988.22 samples/sec Loss 7.6936 LearningRate 0.0539 Epoch: 10 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:45,740-Speed 10975.85 samples/sec Loss 7.8933 LearningRate 0.0539 Epoch: 10 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:46,672-Speed 11001.47 samples/sec Loss 7.8450 LearningRate 0.0539 Epoch: 10 Global Step: 60500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:47,587-Speed 11198.41 samples/sec Loss 7.6959 LearningRate 0.0539 Epoch: 10 Global Step: 60510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:48,538-Speed 10778.81 samples/sec Loss 7.5632 LearningRate 0.0539 Epoch: 10 Global Step: 60520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:49,489-Speed 10774.57 samples/sec Loss 7.7649 LearningRate 0.0539 Epoch: 10 Global Step: 60530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:50,412-Speed 11100.46 samples/sec Loss 7.7719 LearningRate 0.0538 Epoch: 10 Global Step: 60540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:51,339-Speed 11051.31 samples/sec Loss 7.8292 LearningRate 0.0538 Epoch: 10 Global Step: 60550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:52,287-Speed 10809.48 samples/sec Loss 7.6875 LearningRate 0.0538 Epoch: 10 Global Step: 60560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:53,237-Speed 10786.53 samples/sec Loss 7.6664 LearningRate 0.0538 Epoch: 10 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:54,202-Speed 10612.83 samples/sec Loss 7.7319 LearningRate 0.0538 Epoch: 10 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:55,132-Speed 11019.78 samples/sec Loss 7.8048 LearningRate 0.0538 Epoch: 10 Global Step: 60590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:56,100-Speed 10591.14 samples/sec Loss 7.8469 LearningRate 0.0538 Epoch: 10 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:23:56,996-Speed 11433.14 samples/sec Loss 7.7191 LearningRate 0.0538 Epoch: 10 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:57,925-Speed 11032.67 samples/sec Loss 7.6887 LearningRate 0.0538 Epoch: 10 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:58,846-Speed 11127.36 samples/sec Loss 7.8179 LearningRate 0.0538 Epoch: 10 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:23:59,811-Speed 10610.30 samples/sec Loss 7.7877 LearningRate 0.0538 Epoch: 10 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:00,730-Speed 11158.62 samples/sec Loss 7.8371 LearningRate 0.0538 Epoch: 10 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:01,633-Speed 11341.23 samples/sec Loss 7.7618 LearningRate 0.0538 Epoch: 10 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:02,590-Speed 10716.28 samples/sec Loss 7.7607 LearningRate 0.0538 Epoch: 10 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:03,515-Speed 11071.84 samples/sec Loss 7.8552 LearningRate 0.0538 Epoch: 10 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:04,450-Speed 10954.21 samples/sec Loss 7.7720 LearningRate 0.0538 Epoch: 10 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:05,378-Speed 11040.07 samples/sec Loss 7.6223 LearningRate 0.0537 Epoch: 10 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:06,354-Speed 10503.18 samples/sec Loss 7.8172 LearningRate 0.0537 Epoch: 10 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:07,301-Speed 10822.20 samples/sec Loss 7.7752 LearningRate 0.0537 Epoch: 10 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:08,234-Speed 10984.59 samples/sec Loss 7.6962 LearningRate 0.0537 Epoch: 10 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:09,154-Speed 11139.50 samples/sec Loss 7.6204 LearningRate 0.0537 Epoch: 10 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:10,112-Speed 10694.27 samples/sec Loss 7.7987 LearningRate 0.0537 Epoch: 10 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:11,077-Speed 10621.05 samples/sec Loss 7.8489 LearningRate 0.0537 Epoch: 10 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:11,998-Speed 11127.50 samples/sec Loss 7.7264 LearningRate 0.0537 Epoch: 10 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:12,934-Speed 10946.47 samples/sec Loss 7.8165 LearningRate 0.0537 Epoch: 10 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:13,885-Speed 10775.33 samples/sec Loss 7.7312 LearningRate 0.0537 Epoch: 10 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:14,841-Speed 10722.48 samples/sec Loss 7.8177 LearningRate 0.0537 Epoch: 10 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:15,727-Speed 11558.88 samples/sec Loss 7.6467 LearningRate 0.0537 Epoch: 10 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:16,697-Speed 10571.07 samples/sec Loss 7.7133 LearningRate 0.0537 Epoch: 10 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:17,663-Speed 10603.03 samples/sec Loss 7.7100 LearningRate 0.0537 Epoch: 10 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:18,594-Speed 11013.85 samples/sec Loss 7.7996 LearningRate 0.0537 Epoch: 10 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:19,539-Speed 10847.12 samples/sec Loss 7.7301 LearningRate 0.0536 Epoch: 10 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:20,494-Speed 10721.91 samples/sec Loss 7.5533 LearningRate 0.0536 Epoch: 10 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:21,443-Speed 10797.47 samples/sec Loss 7.7885 LearningRate 0.0536 Epoch: 10 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:22,362-Speed 11155.11 samples/sec Loss 7.7371 LearningRate 0.0536 Epoch: 10 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:23,274-Speed 11234.15 samples/sec Loss 7.6954 LearningRate 0.0536 Epoch: 10 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:24,219-Speed 10841.03 samples/sec Loss 7.7878 LearningRate 0.0536 Epoch: 10 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:25,213-Speed 10313.96 samples/sec Loss 7.8135 LearningRate 0.0536 Epoch: 10 Global Step: 60910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:26,128-Speed 11195.12 samples/sec Loss 7.5926 LearningRate 0.0536 Epoch: 10 Global Step: 60920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:27,076-Speed 10810.51 samples/sec Loss 7.6964 LearningRate 0.0536 Epoch: 10 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:28,010-Speed 10967.68 samples/sec Loss 7.6984 LearningRate 0.0536 Epoch: 10 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:28,945-Speed 10959.64 samples/sec Loss 7.8416 LearningRate 0.0536 Epoch: 10 Global Step: 60950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:29,889-Speed 10863.00 samples/sec Loss 7.7357 LearningRate 0.0536 Epoch: 10 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:30,842-Speed 10752.02 samples/sec Loss 7.8018 LearningRate 0.0536 Epoch: 10 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:31,811-Speed 10568.58 samples/sec Loss 7.6544 LearningRate 0.0536 Epoch: 10 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:32,753-Speed 10884.59 samples/sec Loss 7.8510 LearningRate 0.0536 Epoch: 10 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:33,712-Speed 10686.35 samples/sec Loss 7.8367 LearningRate 0.0536 Epoch: 10 Global Step: 61000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:34,592-Speed 11641.68 samples/sec Loss 7.7846 LearningRate 0.0535 Epoch: 10 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:35,532-Speed 10895.92 samples/sec Loss 7.6895 LearningRate 0.0535 Epoch: 10 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:36,478-Speed 10833.87 samples/sec Loss 7.7719 LearningRate 0.0535 Epoch: 10 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:37,434-Speed 10724.86 samples/sec Loss 7.8598 LearningRate 0.0535 Epoch: 10 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:38,355-Speed 11123.26 samples/sec Loss 8.0377 LearningRate 0.0535 Epoch: 10 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:39,317-Speed 10644.28 samples/sec Loss 7.6257 LearningRate 0.0535 Epoch: 10 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:40,280-Speed 10647.87 samples/sec Loss 7.6573 LearningRate 0.0535 Epoch: 10 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:41,272-Speed 10326.03 samples/sec Loss 7.9815 LearningRate 0.0535 Epoch: 10 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:42,183-Speed 11244.12 samples/sec Loss 7.8212 LearningRate 0.0535 Epoch: 10 Global Step: 61090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:43,160-Speed 10492.10 samples/sec Loss 7.7112 LearningRate 0.0535 Epoch: 10 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:24:44,086-Speed 11058.62 samples/sec Loss 7.6966 LearningRate 0.0535 Epoch: 10 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:45,048-Speed 10653.31 samples/sec Loss 7.8210 LearningRate 0.0535 Epoch: 10 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:45,944-Speed 11442.72 samples/sec Loss 7.8772 LearningRate 0.0535 Epoch: 10 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:46,907-Speed 10637.43 samples/sec Loss 7.7533 LearningRate 0.0535 Epoch: 10 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:47,813-Speed 11318.85 samples/sec Loss 7.7215 LearningRate 0.0535 Epoch: 10 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:48,762-Speed 10791.89 samples/sec Loss 7.8556 LearningRate 0.0534 Epoch: 10 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:49,695-Speed 10988.88 samples/sec Loss 7.7156 LearningRate 0.0534 Epoch: 10 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:50,604-Speed 11265.59 samples/sec Loss 7.6826 LearningRate 0.0534 Epoch: 10 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:51,567-Speed 10639.16 samples/sec Loss 7.8906 LearningRate 0.0534 Epoch: 10 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:52,496-Speed 11042.14 samples/sec Loss 7.6190 LearningRate 0.0534 Epoch: 10 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:53,416-Speed 11129.47 samples/sec Loss 7.7059 LearningRate 0.0534 Epoch: 10 Global Step: 61210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:54,365-Speed 10800.83 samples/sec Loss 7.5526 LearningRate 0.0534 Epoch: 10 Global Step: 61220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:55,298-Speed 10982.74 samples/sec Loss 7.8710 LearningRate 0.0534 Epoch: 10 Global Step: 61230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:56,293-Speed 10296.60 samples/sec Loss 7.8432 LearningRate 0.0534 Epoch: 10 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:57,214-Speed 11125.15 samples/sec Loss 7.6815 LearningRate 0.0534 Epoch: 10 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:58,159-Speed 10845.19 samples/sec Loss 7.8441 LearningRate 0.0534 Epoch: 10 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:24:59,081-Speed 11108.84 samples/sec Loss 7.8877 LearningRate 0.0534 Epoch: 10 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:00,057-Speed 10498.82 samples/sec Loss 7.6592 LearningRate 0.0534 Epoch: 10 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:01,028-Speed 10560.96 samples/sec Loss 7.9464 LearningRate 0.0534 Epoch: 10 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:01,990-Speed 10642.17 samples/sec Loss 7.8646 LearningRate 0.0534 Epoch: 10 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:02,935-Speed 10844.39 samples/sec Loss 7.6197 LearningRate 0.0534 Epoch: 10 Global Step: 61310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:03,935-Speed 10249.95 samples/sec Loss 7.7626 LearningRate 0.0533 Epoch: 10 Global Step: 61320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:04,902-Speed 10596.68 samples/sec Loss 7.9019 LearningRate 0.0533 Epoch: 10 Global Step: 61330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:05,834-Speed 10994.61 samples/sec Loss 7.6695 LearningRate 0.0533 Epoch: 10 Global Step: 61340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:06,794-Speed 10678.02 samples/sec Loss 7.7019 LearningRate 0.0533 Epoch: 10 Global Step: 61350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:07,710-Speed 11183.07 samples/sec Loss 7.7738 LearningRate 0.0533 Epoch: 10 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:08,655-Speed 10850.84 samples/sec Loss 7.6203 LearningRate 0.0533 Epoch: 10 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:09,547-Speed 11481.39 samples/sec Loss 7.8230 LearningRate 0.0533 Epoch: 10 Global Step: 61380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:10,449-Speed 11355.14 samples/sec Loss 7.7914 LearningRate 0.0533 Epoch: 10 Global Step: 61390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:11,387-Speed 10922.34 samples/sec Loss 7.6394 LearningRate 0.0533 Epoch: 10 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:12,369-Speed 10441.05 samples/sec Loss 7.6692 LearningRate 0.0533 Epoch: 10 Global Step: 61410 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:25:13,279-Speed 11254.86 samples/sec Loss 7.4959 LearningRate 0.0533 Epoch: 10 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:14,210-Speed 11009.52 samples/sec Loss 7.8105 LearningRate 0.0533 Epoch: 10 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:15,156-Speed 10835.69 samples/sec Loss 7.7872 LearningRate 0.0533 Epoch: 10 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:16,086-Speed 11011.50 samples/sec Loss 7.7446 LearningRate 0.0533 Epoch: 10 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:17,033-Speed 10825.60 samples/sec Loss 7.9251 LearningRate 0.0533 Epoch: 10 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:17,962-Speed 11028.41 samples/sec Loss 7.8000 LearningRate 0.0533 Epoch: 10 Global Step: 61470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:18,885-Speed 11102.67 samples/sec Loss 7.7557 LearningRate 0.0532 Epoch: 10 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:19,789-Speed 11331.60 samples/sec Loss 7.7849 LearningRate 0.0532 Epoch: 10 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:20,697-Speed 11290.61 samples/sec Loss 7.6609 LearningRate 0.0532 Epoch: 10 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:21,604-Speed 11292.19 samples/sec Loss 7.8433 LearningRate 0.0532 Epoch: 10 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:22,588-Speed 10421.28 samples/sec Loss 7.6259 LearningRate 0.0532 Epoch: 10 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:23,560-Speed 10540.28 samples/sec Loss 7.7535 LearningRate 0.0532 Epoch: 10 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:24,508-Speed 10804.08 samples/sec Loss 7.8026 LearningRate 0.0532 Epoch: 10 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:25,439-Speed 11011.73 samples/sec Loss 7.7710 LearningRate 0.0532 Epoch: 10 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:26,395-Speed 10719.01 samples/sec Loss 7.6999 LearningRate 0.0532 Epoch: 10 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:27,325-Speed 11018.00 samples/sec Loss 7.7084 LearningRate 0.0532 Epoch: 10 Global Step: 61570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:28,293-Speed 10578.19 samples/sec Loss 7.7389 LearningRate 0.0532 Epoch: 10 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:29,220-Speed 11064.72 samples/sec Loss 7.7048 LearningRate 0.0532 Epoch: 10 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:30,180-Speed 10667.02 samples/sec Loss 7.8326 LearningRate 0.0532 Epoch: 10 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:31,110-Speed 11021.91 samples/sec Loss 7.7170 LearningRate 0.0532 Epoch: 10 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:32,043-Speed 10987.72 samples/sec Loss 7.7860 LearningRate 0.0532 Epoch: 10 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:32,942-Speed 11392.14 samples/sec Loss 7.6455 LearningRate 0.0531 Epoch: 10 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:33,872-Speed 11019.02 samples/sec Loss 7.8498 LearningRate 0.0531 Epoch: 10 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:34,801-Speed 11024.31 samples/sec Loss 7.7587 LearningRate 0.0531 Epoch: 10 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:35,764-Speed 10645.59 samples/sec Loss 7.5458 LearningRate 0.0531 Epoch: 10 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:36,705-Speed 10893.38 samples/sec Loss 7.6499 LearningRate 0.0531 Epoch: 10 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:37,639-Speed 10963.84 samples/sec Loss 7.8182 LearningRate 0.0531 Epoch: 10 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:38,565-Speed 11072.38 samples/sec Loss 7.7400 LearningRate 0.0531 Epoch: 10 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:39,511-Speed 10827.64 samples/sec Loss 7.7035 LearningRate 0.0531 Epoch: 10 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:40,432-Speed 11126.65 samples/sec Loss 7.7525 LearningRate 0.0531 Epoch: 10 Global Step: 61710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:41,340-Speed 11294.01 samples/sec Loss 7.6608 LearningRate 0.0531 Epoch: 10 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:42,326-Speed 10390.81 samples/sec Loss 7.6689 LearningRate 0.0531 Epoch: 10 Global Step: 61730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:43,294-Speed 10586.09 samples/sec Loss 7.6740 LearningRate 0.0531 Epoch: 10 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:44,248-Speed 10742.08 samples/sec Loss 7.8272 LearningRate 0.0531 Epoch: 10 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:45,189-Speed 10887.86 samples/sec Loss 7.8362 LearningRate 0.0531 Epoch: 10 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:46,125-Speed 10945.58 samples/sec Loss 7.6480 LearningRate 0.0531 Epoch: 10 Global Step: 61770 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:25:47,067-Speed 10877.36 samples/sec Loss 7.8500 LearningRate 0.0531 Epoch: 10 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:48,006-Speed 10912.83 samples/sec Loss 7.5612 LearningRate 0.0530 Epoch: 10 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:48,915-Speed 11266.87 samples/sec Loss 7.7129 LearningRate 0.0530 Epoch: 10 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:49,833-Speed 11171.36 samples/sec Loss 7.7914 LearningRate 0.0530 Epoch: 10 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:50,792-Speed 10682.77 samples/sec Loss 7.5470 LearningRate 0.0530 Epoch: 10 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:51,738-Speed 10826.03 samples/sec Loss 7.5539 LearningRate 0.0530 Epoch: 10 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:52,696-Speed 10705.47 samples/sec Loss 7.6701 LearningRate 0.0530 Epoch: 10 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:53,651-Speed 10723.72 samples/sec Loss 7.7957 LearningRate 0.0530 Epoch: 10 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:25:54,579-Speed 11047.32 samples/sec Loss 7.7917 LearningRate 0.0530 Epoch: 10 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:55,499-Speed 11137.72 samples/sec Loss 7.7725 LearningRate 0.0530 Epoch: 10 Global Step: 61870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:56,454-Speed 10724.47 samples/sec Loss 7.6752 LearningRate 0.0530 Epoch: 10 Global Step: 61880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:57,415-Speed 10663.76 samples/sec Loss 7.8248 LearningRate 0.0530 Epoch: 10 Global Step: 61890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:58,385-Speed 10560.38 samples/sec Loss 7.6694 LearningRate 0.0530 Epoch: 10 Global Step: 61900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:25:59,342-Speed 10707.47 samples/sec Loss 7.7307 LearningRate 0.0530 Epoch: 10 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:26:00,279-Speed 10947.74 samples/sec Loss 7.7400 LearningRate 0.0530 Epoch: 10 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:26:01,196-Speed 11175.18 samples/sec Loss 7.8406 LearningRate 0.0530 Epoch: 10 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:26:02,170-Speed 10512.41 samples/sec Loss 7.6743 LearningRate 0.0529 Epoch: 10 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:26:03,129-Speed 10686.95 samples/sec Loss 7.7991 LearningRate 0.0529 Epoch: 10 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:26:04,095-Speed 10611.17 samples/sec Loss 7.8538 LearningRate 0.0529 Epoch: 10 Global Step: 61960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:26:05,015-Speed 11140.03 samples/sec Loss 7.7561 LearningRate 0.0529 Epoch: 10 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:26:05,969-Speed 10735.59 samples/sec Loss 7.7748 LearningRate 0.0529 Epoch: 10 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:26:06,938-Speed 10576.22 samples/sec Loss 7.8290 LearningRate 0.0529 Epoch: 10 Global Step: 61990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:26:07,886-Speed 10809.95 samples/sec Loss 7.6859 LearningRate 0.0529 Epoch: 10 Global Step: 62000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:26:29,256-[lfw][62000]XNorm: 12.810641 Training: 2022-04-11 01:26:29,257-[lfw][62000]Accuracy-Flip: 0.99400+-0.00389 Training: 2022-04-11 01:26:29,257-[lfw][62000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:26:53,806-[cfp_fp][62000]XNorm: 10.703338 Training: 2022-04-11 01:26:53,806-[cfp_fp][62000]Accuracy-Flip: 0.94143+-0.01094 Training: 2022-04-11 01:26:53,806-[cfp_fp][62000]Accuracy-Highest: 0.94143 Training: 2022-04-11 01:27:15,050-[agedb_30][62000]XNorm: 12.494118 Training: 2022-04-11 01:27:15,050-[agedb_30][62000]Accuracy-Flip: 0.95650+-0.01127 Training: 2022-04-11 01:27:15,051-[agedb_30][62000]Accuracy-Highest: 0.96067 Training: 2022-04-11 01:27:15,979-Speed 150.38 samples/sec Loss 7.6686 LearningRate 0.0529 Epoch: 10 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:16,925-Speed 10830.99 samples/sec Loss 7.7028 LearningRate 0.0529 Epoch: 10 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:17,847-Speed 11107.81 samples/sec Loss 7.6901 LearningRate 0.0529 Epoch: 10 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:18,761-Speed 11214.61 samples/sec Loss 7.7187 LearningRate 0.0529 Epoch: 10 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:19,669-Speed 11281.17 samples/sec Loss 7.6926 LearningRate 0.0529 Epoch: 10 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:20,603-Speed 10975.57 samples/sec Loss 7.7567 LearningRate 0.0529 Epoch: 10 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:21,564-Speed 10658.65 samples/sec Loss 7.8140 LearningRate 0.0529 Epoch: 10 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:22,484-Speed 11149.66 samples/sec Loss 7.7181 LearningRate 0.0529 Epoch: 10 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:23,398-Speed 11201.07 samples/sec Loss 7.6481 LearningRate 0.0529 Epoch: 10 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:24,332-Speed 10978.73 samples/sec Loss 7.8619 LearningRate 0.0528 Epoch: 10 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:25,266-Speed 10967.07 samples/sec Loss 7.6609 LearningRate 0.0528 Epoch: 10 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:26,191-Speed 11081.19 samples/sec Loss 7.7335 LearningRate 0.0528 Epoch: 10 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:27,104-Speed 11220.78 samples/sec Loss 7.7711 LearningRate 0.0528 Epoch: 10 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:28,054-Speed 10786.93 samples/sec Loss 7.7664 LearningRate 0.0528 Epoch: 10 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:28,978-Speed 11085.54 samples/sec Loss 7.7646 LearningRate 0.0528 Epoch: 10 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:29,948-Speed 10565.17 samples/sec Loss 7.7199 LearningRate 0.0528 Epoch: 10 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:30,878-Speed 11016.41 samples/sec Loss 7.7890 LearningRate 0.0528 Epoch: 10 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:31,830-Speed 10767.36 samples/sec Loss 7.6342 LearningRate 0.0528 Epoch: 10 Global Step: 62180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:32,723-Speed 11474.38 samples/sec Loss 7.6372 LearningRate 0.0528 Epoch: 10 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:33,651-Speed 11048.13 samples/sec Loss 7.7554 LearningRate 0.0528 Epoch: 10 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:34,570-Speed 11138.89 samples/sec Loss 7.7359 LearningRate 0.0528 Epoch: 10 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:35,526-Speed 10724.77 samples/sec Loss 7.6778 LearningRate 0.0528 Epoch: 10 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:36,476-Speed 10778.37 samples/sec Loss 7.6658 LearningRate 0.0528 Epoch: 10 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:37,470-Speed 10309.07 samples/sec Loss 7.7081 LearningRate 0.0528 Epoch: 10 Global Step: 62240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:38,378-Speed 11288.96 samples/sec Loss 7.7147 LearningRate 0.0528 Epoch: 10 Global Step: 62250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:39,311-Speed 10988.62 samples/sec Loss 7.6876 LearningRate 0.0527 Epoch: 10 Global Step: 62260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:40,244-Speed 10970.99 samples/sec Loss 7.7261 LearningRate 0.0527 Epoch: 10 Global Step: 62270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:41,172-Speed 11048.00 samples/sec Loss 7.6955 LearningRate 0.0527 Epoch: 10 Global Step: 62280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:27:42,068-Speed 11437.06 samples/sec Loss 7.6533 LearningRate 0.0527 Epoch: 10 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:43,018-Speed 10797.33 samples/sec Loss 7.9269 LearningRate 0.0527 Epoch: 10 Global Step: 62300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:43,929-Speed 11243.20 samples/sec Loss 7.7680 LearningRate 0.0527 Epoch: 10 Global Step: 62310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:44,869-Speed 10905.87 samples/sec Loss 7.6451 LearningRate 0.0527 Epoch: 10 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:45,803-Speed 10970.46 samples/sec Loss 7.7072 LearningRate 0.0527 Epoch: 10 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:46,706-Speed 11347.64 samples/sec Loss 7.8781 LearningRate 0.0527 Epoch: 10 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:47,645-Speed 10912.05 samples/sec Loss 7.6775 LearningRate 0.0527 Epoch: 10 Global Step: 62350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:48,614-Speed 10568.15 samples/sec Loss 7.6242 LearningRate 0.0527 Epoch: 10 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:49,551-Speed 10939.61 samples/sec Loss 7.7617 LearningRate 0.0527 Epoch: 10 Global Step: 62370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:50,489-Speed 10923.64 samples/sec Loss 7.7415 LearningRate 0.0527 Epoch: 10 Global Step: 62380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:51,451-Speed 10657.31 samples/sec Loss 7.7091 LearningRate 0.0527 Epoch: 10 Global Step: 62390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:52,406-Speed 10746.95 samples/sec Loss 7.7933 LearningRate 0.0527 Epoch: 10 Global Step: 62400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:53,364-Speed 10701.81 samples/sec Loss 7.7392 LearningRate 0.0526 Epoch: 10 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:54,321-Speed 10706.14 samples/sec Loss 7.9238 LearningRate 0.0526 Epoch: 10 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:55,258-Speed 10936.89 samples/sec Loss 7.7233 LearningRate 0.0526 Epoch: 10 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:56,210-Speed 10763.60 samples/sec Loss 7.7383 LearningRate 0.0526 Epoch: 10 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:57,103-Speed 11474.23 samples/sec Loss 7.6493 LearningRate 0.0526 Epoch: 10 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:57,994-Speed 11502.54 samples/sec Loss 7.7303 LearningRate 0.0526 Epoch: 10 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:58,945-Speed 10778.63 samples/sec Loss 7.5914 LearningRate 0.0526 Epoch: 10 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:27:59,864-Speed 11142.51 samples/sec Loss 7.8330 LearningRate 0.0526 Epoch: 10 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:00,840-Speed 10504.25 samples/sec Loss 7.6861 LearningRate 0.0526 Epoch: 10 Global Step: 62490 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:28:01,779-Speed 10910.35 samples/sec Loss 7.7991 LearningRate 0.0526 Epoch: 10 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:02,724-Speed 10841.68 samples/sec Loss 7.7255 LearningRate 0.0526 Epoch: 10 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:03,684-Speed 10678.50 samples/sec Loss 7.7143 LearningRate 0.0526 Epoch: 10 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:04,603-Speed 11147.32 samples/sec Loss 7.7292 LearningRate 0.0526 Epoch: 10 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:05,608-Speed 10192.57 samples/sec Loss 7.6144 LearningRate 0.0526 Epoch: 10 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:16,271-Speed 960.48 samples/sec Loss 7.4710 LearningRate 0.0526 Epoch: 11 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:17,212-Speed 10893.66 samples/sec Loss 7.0714 LearningRate 0.0526 Epoch: 11 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:18,339-Speed 9091.10 samples/sec Loss 6.9098 LearningRate 0.0525 Epoch: 11 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:19,319-Speed 10452.08 samples/sec Loss 6.9615 LearningRate 0.0525 Epoch: 11 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:20,350-Speed 9938.24 samples/sec Loss 6.9625 LearningRate 0.0525 Epoch: 11 Global Step: 62590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:21,354-Speed 10212.93 samples/sec Loss 6.8482 LearningRate 0.0525 Epoch: 11 Global Step: 62600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:22,314-Speed 10668.79 samples/sec Loss 7.0356 LearningRate 0.0525 Epoch: 11 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:23,258-Speed 10862.35 samples/sec Loss 7.0699 LearningRate 0.0525 Epoch: 11 Global Step: 62620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:24,222-Speed 10622.02 samples/sec Loss 7.0694 LearningRate 0.0525 Epoch: 11 Global Step: 62630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:25,154-Speed 10999.80 samples/sec Loss 7.0959 LearningRate 0.0525 Epoch: 11 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:26,087-Speed 10985.27 samples/sec Loss 7.2248 LearningRate 0.0525 Epoch: 11 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:27,021-Speed 10968.63 samples/sec Loss 7.0934 LearningRate 0.0525 Epoch: 11 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:27,971-Speed 10791.22 samples/sec Loss 7.0672 LearningRate 0.0525 Epoch: 11 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:28,999-Speed 9962.82 samples/sec Loss 7.0540 LearningRate 0.0525 Epoch: 11 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:29,984-Speed 10405.94 samples/sec Loss 7.1534 LearningRate 0.0525 Epoch: 11 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:30,929-Speed 10844.83 samples/sec Loss 7.2399 LearningRate 0.0525 Epoch: 11 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:31,952-Speed 10027.19 samples/sec Loss 7.1151 LearningRate 0.0525 Epoch: 11 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:32,924-Speed 10539.88 samples/sec Loss 7.0894 LearningRate 0.0525 Epoch: 11 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:33,883-Speed 10689.18 samples/sec Loss 7.0310 LearningRate 0.0524 Epoch: 11 Global Step: 62730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:34,868-Speed 10402.49 samples/sec Loss 7.0302 LearningRate 0.0524 Epoch: 11 Global Step: 62740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:35,791-Speed 11099.28 samples/sec Loss 7.0654 LearningRate 0.0524 Epoch: 11 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:36,764-Speed 10536.29 samples/sec Loss 7.1317 LearningRate 0.0524 Epoch: 11 Global Step: 62760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:37,676-Speed 11232.07 samples/sec Loss 7.2464 LearningRate 0.0524 Epoch: 11 Global Step: 62770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:38,620-Speed 10857.14 samples/sec Loss 7.1430 LearningRate 0.0524 Epoch: 11 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:39,566-Speed 10833.90 samples/sec Loss 7.0206 LearningRate 0.0524 Epoch: 11 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:40,527-Speed 10655.82 samples/sec Loss 7.1446 LearningRate 0.0524 Epoch: 11 Global Step: 62800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:41,474-Speed 10826.20 samples/sec Loss 7.1330 LearningRate 0.0524 Epoch: 11 Global Step: 62810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:42,386-Speed 11231.81 samples/sec Loss 7.0410 LearningRate 0.0524 Epoch: 11 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:43,303-Speed 11173.51 samples/sec Loss 7.1992 LearningRate 0.0524 Epoch: 11 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:44,239-Speed 10956.47 samples/sec Loss 7.3110 LearningRate 0.0524 Epoch: 11 Global Step: 62840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:45,169-Speed 11015.90 samples/sec Loss 7.2863 LearningRate 0.0524 Epoch: 11 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:46,093-Speed 11084.06 samples/sec Loss 7.2897 LearningRate 0.0524 Epoch: 11 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:47,009-Speed 11186.90 samples/sec Loss 7.1112 LearningRate 0.0524 Epoch: 11 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:47,946-Speed 10959.79 samples/sec Loss 7.1541 LearningRate 0.0523 Epoch: 11 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:48,859-Speed 11216.14 samples/sec Loss 7.1673 LearningRate 0.0523 Epoch: 11 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:49,762-Speed 11354.95 samples/sec Loss 7.2895 LearningRate 0.0523 Epoch: 11 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:50,699-Speed 10939.71 samples/sec Loss 7.1212 LearningRate 0.0523 Epoch: 11 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:51,615-Speed 11188.79 samples/sec Loss 7.2773 LearningRate 0.0523 Epoch: 11 Global Step: 62920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:52,519-Speed 11330.79 samples/sec Loss 7.2497 LearningRate 0.0523 Epoch: 11 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:53,440-Speed 11131.61 samples/sec Loss 7.2160 LearningRate 0.0523 Epoch: 11 Global Step: 62940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:54,334-Speed 11450.94 samples/sec Loss 7.0550 LearningRate 0.0523 Epoch: 11 Global Step: 62950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:55,244-Speed 11261.39 samples/sec Loss 7.3312 LearningRate 0.0523 Epoch: 11 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:28:56,163-Speed 11151.98 samples/sec Loss 7.1548 LearningRate 0.0523 Epoch: 11 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:57,114-Speed 10778.77 samples/sec Loss 7.2975 LearningRate 0.0523 Epoch: 11 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:58,053-Speed 10917.33 samples/sec Loss 7.1835 LearningRate 0.0523 Epoch: 11 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:58,977-Speed 11082.52 samples/sec Loss 7.2089 LearningRate 0.0523 Epoch: 11 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:28:59,913-Speed 10949.22 samples/sec Loss 7.2731 LearningRate 0.0523 Epoch: 11 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:00,845-Speed 10998.18 samples/sec Loss 7.2142 LearningRate 0.0523 Epoch: 11 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:01,788-Speed 10868.74 samples/sec Loss 7.3565 LearningRate 0.0523 Epoch: 11 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:02,744-Speed 10717.78 samples/sec Loss 7.3990 LearningRate 0.0522 Epoch: 11 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:03,680-Speed 10945.66 samples/sec Loss 7.2369 LearningRate 0.0522 Epoch: 11 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:04,638-Speed 10705.87 samples/sec Loss 7.3109 LearningRate 0.0522 Epoch: 11 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:05,586-Speed 10805.48 samples/sec Loss 7.3146 LearningRate 0.0522 Epoch: 11 Global Step: 63070 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:29:06,540-Speed 10740.40 samples/sec Loss 7.3603 LearningRate 0.0522 Epoch: 11 Global Step: 63080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:07,483-Speed 10867.68 samples/sec Loss 7.3834 LearningRate 0.0522 Epoch: 11 Global Step: 63090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:08,424-Speed 10896.27 samples/sec Loss 7.3594 LearningRate 0.0522 Epoch: 11 Global Step: 63100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:09,350-Speed 11056.85 samples/sec Loss 7.2189 LearningRate 0.0522 Epoch: 11 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:10,322-Speed 10548.83 samples/sec Loss 7.3935 LearningRate 0.0522 Epoch: 11 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:11,268-Speed 10824.01 samples/sec Loss 7.2879 LearningRate 0.0522 Epoch: 11 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:12,181-Speed 11222.25 samples/sec Loss 7.2790 LearningRate 0.0522 Epoch: 11 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:13,116-Speed 10959.17 samples/sec Loss 7.4384 LearningRate 0.0522 Epoch: 11 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:14,034-Speed 11169.11 samples/sec Loss 7.1719 LearningRate 0.0522 Epoch: 11 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:14,987-Speed 10748.38 samples/sec Loss 7.3390 LearningRate 0.0522 Epoch: 11 Global Step: 63170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:15,900-Speed 11225.07 samples/sec Loss 7.2818 LearningRate 0.0522 Epoch: 11 Global Step: 63180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:16,834-Speed 10976.36 samples/sec Loss 7.1669 LearningRate 0.0522 Epoch: 11 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:17,754-Speed 11132.73 samples/sec Loss 7.2566 LearningRate 0.0521 Epoch: 11 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:18,688-Speed 10969.48 samples/sec Loss 7.4692 LearningRate 0.0521 Epoch: 11 Global Step: 63210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:19,624-Speed 10949.98 samples/sec Loss 7.2618 LearningRate 0.0521 Epoch: 11 Global Step: 63220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:20,540-Speed 11183.73 samples/sec Loss 7.3661 LearningRate 0.0521 Epoch: 11 Global Step: 63230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:21,469-Speed 11030.05 samples/sec Loss 7.4270 LearningRate 0.0521 Epoch: 11 Global Step: 63240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:22,375-Speed 11316.80 samples/sec Loss 7.4228 LearningRate 0.0521 Epoch: 11 Global Step: 63250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:23,291-Speed 11188.95 samples/sec Loss 7.3808 LearningRate 0.0521 Epoch: 11 Global Step: 63260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:24,223-Speed 10986.29 samples/sec Loss 7.3524 LearningRate 0.0521 Epoch: 11 Global Step: 63270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:25,183-Speed 10677.85 samples/sec Loss 7.3867 LearningRate 0.0521 Epoch: 11 Global Step: 63280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:26,094-Speed 11244.40 samples/sec Loss 7.4343 LearningRate 0.0521 Epoch: 11 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:27,046-Speed 10772.24 samples/sec Loss 7.5339 LearningRate 0.0521 Epoch: 11 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:27,959-Speed 11219.46 samples/sec Loss 7.5161 LearningRate 0.0521 Epoch: 11 Global Step: 63310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:28,884-Speed 11076.25 samples/sec Loss 7.1923 LearningRate 0.0521 Epoch: 11 Global Step: 63320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:29,814-Speed 11019.29 samples/sec Loss 7.4272 LearningRate 0.0521 Epoch: 11 Global Step: 63330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:30,738-Speed 11093.81 samples/sec Loss 7.5172 LearningRate 0.0521 Epoch: 11 Global Step: 63340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:31,660-Speed 11111.44 samples/sec Loss 7.3182 LearningRate 0.0521 Epoch: 11 Global Step: 63350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:32,593-Speed 10975.33 samples/sec Loss 7.2564 LearningRate 0.0520 Epoch: 11 Global Step: 63360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:33,543-Speed 10792.48 samples/sec Loss 7.3520 LearningRate 0.0520 Epoch: 11 Global Step: 63370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:34,474-Speed 11010.88 samples/sec Loss 7.1680 LearningRate 0.0520 Epoch: 11 Global Step: 63380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:35,419-Speed 10833.96 samples/sec Loss 7.2998 LearningRate 0.0520 Epoch: 11 Global Step: 63390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:36,354-Speed 10967.55 samples/sec Loss 7.3050 LearningRate 0.0520 Epoch: 11 Global Step: 63400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:37,283-Speed 11030.13 samples/sec Loss 7.3253 LearningRate 0.0520 Epoch: 11 Global Step: 63410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:38,249-Speed 10599.67 samples/sec Loss 7.4727 LearningRate 0.0520 Epoch: 11 Global Step: 63420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:39,198-Speed 10803.35 samples/sec Loss 7.2292 LearningRate 0.0520 Epoch: 11 Global Step: 63430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:40,125-Speed 11054.53 samples/sec Loss 7.4366 LearningRate 0.0520 Epoch: 11 Global Step: 63440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:41,029-Speed 11333.70 samples/sec Loss 7.4713 LearningRate 0.0520 Epoch: 11 Global Step: 63450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:41,965-Speed 10949.42 samples/sec Loss 7.3658 LearningRate 0.0520 Epoch: 11 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:42,900-Speed 10960.84 samples/sec Loss 7.3646 LearningRate 0.0520 Epoch: 11 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:43,850-Speed 10787.74 samples/sec Loss 7.3922 LearningRate 0.0520 Epoch: 11 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:44,784-Speed 10968.40 samples/sec Loss 7.3342 LearningRate 0.0520 Epoch: 11 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:45,708-Speed 11091.27 samples/sec Loss 7.4642 LearningRate 0.0520 Epoch: 11 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:46,650-Speed 10875.23 samples/sec Loss 7.4814 LearningRate 0.0519 Epoch: 11 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:47,596-Speed 10830.59 samples/sec Loss 7.5113 LearningRate 0.0519 Epoch: 11 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:48,555-Speed 10692.23 samples/sec Loss 7.6304 LearningRate 0.0519 Epoch: 11 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:49,487-Speed 10991.43 samples/sec Loss 7.4917 LearningRate 0.0519 Epoch: 11 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:50,413-Speed 11065.22 samples/sec Loss 7.4270 LearningRate 0.0519 Epoch: 11 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:29:51,334-Speed 11127.48 samples/sec Loss 7.3392 LearningRate 0.0519 Epoch: 11 Global Step: 63560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:52,248-Speed 11211.80 samples/sec Loss 7.3515 LearningRate 0.0519 Epoch: 11 Global Step: 63570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:53,183-Speed 10954.03 samples/sec Loss 7.3432 LearningRate 0.0519 Epoch: 11 Global Step: 63580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:54,116-Speed 10986.16 samples/sec Loss 7.4478 LearningRate 0.0519 Epoch: 11 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:55,040-Speed 11088.02 samples/sec Loss 7.5091 LearningRate 0.0519 Epoch: 11 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:55,964-Speed 11087.66 samples/sec Loss 7.5098 LearningRate 0.0519 Epoch: 11 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:56,883-Speed 11145.36 samples/sec Loss 7.5153 LearningRate 0.0519 Epoch: 11 Global Step: 63620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:57,780-Speed 11436.13 samples/sec Loss 7.3464 LearningRate 0.0519 Epoch: 11 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:58,712-Speed 10993.60 samples/sec Loss 7.4436 LearningRate 0.0519 Epoch: 11 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:29:59,636-Speed 11089.45 samples/sec Loss 7.4352 LearningRate 0.0519 Epoch: 11 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:00,581-Speed 10843.33 samples/sec Loss 7.3973 LearningRate 0.0519 Epoch: 11 Global Step: 63660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:01,498-Speed 11175.58 samples/sec Loss 7.4046 LearningRate 0.0518 Epoch: 11 Global Step: 63670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:02,466-Speed 10587.85 samples/sec Loss 7.4583 LearningRate 0.0518 Epoch: 11 Global Step: 63680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:03,375-Speed 11273.92 samples/sec Loss 7.4396 LearningRate 0.0518 Epoch: 11 Global Step: 63690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:04,298-Speed 11102.18 samples/sec Loss 7.4935 LearningRate 0.0518 Epoch: 11 Global Step: 63700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:05,247-Speed 10795.51 samples/sec Loss 7.4600 LearningRate 0.0518 Epoch: 11 Global Step: 63710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:06,189-Speed 10883.94 samples/sec Loss 7.4365 LearningRate 0.0518 Epoch: 11 Global Step: 63720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:07,103-Speed 11208.30 samples/sec Loss 7.5424 LearningRate 0.0518 Epoch: 11 Global Step: 63730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:08,059-Speed 10713.56 samples/sec Loss 7.4434 LearningRate 0.0518 Epoch: 11 Global Step: 63740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:08,997-Speed 10922.67 samples/sec Loss 7.4883 LearningRate 0.0518 Epoch: 11 Global Step: 63750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:09,960-Speed 10641.26 samples/sec Loss 7.5201 LearningRate 0.0518 Epoch: 11 Global Step: 63760 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:30:10,904-Speed 10860.44 samples/sec Loss 7.5462 LearningRate 0.0518 Epoch: 11 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:11,843-Speed 10904.88 samples/sec Loss 7.5161 LearningRate 0.0518 Epoch: 11 Global Step: 63780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:12,791-Speed 10810.32 samples/sec Loss 7.5206 LearningRate 0.0518 Epoch: 11 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:13,764-Speed 10530.34 samples/sec Loss 7.5033 LearningRate 0.0518 Epoch: 11 Global Step: 63800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:14,719-Speed 10739.29 samples/sec Loss 7.4327 LearningRate 0.0518 Epoch: 11 Global Step: 63810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:15,648-Speed 11020.23 samples/sec Loss 7.4094 LearningRate 0.0518 Epoch: 11 Global Step: 63820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:16,605-Speed 10720.70 samples/sec Loss 7.5832 LearningRate 0.0517 Epoch: 11 Global Step: 63830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:17,548-Speed 10857.62 samples/sec Loss 7.4852 LearningRate 0.0517 Epoch: 11 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:18,512-Speed 10629.36 samples/sec Loss 7.3949 LearningRate 0.0517 Epoch: 11 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:19,533-Speed 10037.08 samples/sec Loss 7.4742 LearningRate 0.0517 Epoch: 11 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:20,474-Speed 10886.06 samples/sec Loss 7.4791 LearningRate 0.0517 Epoch: 11 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:21,385-Speed 11254.55 samples/sec Loss 7.3981 LearningRate 0.0517 Epoch: 11 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:22,300-Speed 11199.39 samples/sec Loss 7.3721 LearningRate 0.0517 Epoch: 11 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:23,215-Speed 11203.03 samples/sec Loss 7.4984 LearningRate 0.0517 Epoch: 11 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:24,136-Speed 11120.24 samples/sec Loss 7.4125 LearningRate 0.0517 Epoch: 11 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:25,038-Speed 11367.22 samples/sec Loss 7.3708 LearningRate 0.0517 Epoch: 11 Global Step: 63920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:25,937-Speed 11391.86 samples/sec Loss 7.3915 LearningRate 0.0517 Epoch: 11 Global Step: 63930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:26,840-Speed 11351.84 samples/sec Loss 7.5708 LearningRate 0.0517 Epoch: 11 Global Step: 63940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:27,782-Speed 10886.97 samples/sec Loss 7.3751 LearningRate 0.0517 Epoch: 11 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:28,763-Speed 10443.62 samples/sec Loss 7.4541 LearningRate 0.0517 Epoch: 11 Global Step: 63960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:29,710-Speed 10820.50 samples/sec Loss 7.5596 LearningRate 0.0517 Epoch: 11 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:30,633-Speed 11100.64 samples/sec Loss 7.4785 LearningRate 0.0517 Epoch: 11 Global Step: 63980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:31,577-Speed 10854.83 samples/sec Loss 7.5666 LearningRate 0.0516 Epoch: 11 Global Step: 63990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:32,501-Speed 11088.68 samples/sec Loss 7.4646 LearningRate 0.0516 Epoch: 11 Global Step: 64000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:30:53,691-[lfw][64000]XNorm: 12.759060 Training: 2022-04-11 01:30:53,691-[lfw][64000]Accuracy-Flip: 0.99550+-0.00299 Training: 2022-04-11 01:30:53,692-[lfw][64000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:31:18,265-[cfp_fp][64000]XNorm: 10.728800 Training: 2022-04-11 01:31:18,266-[cfp_fp][64000]Accuracy-Flip: 0.93757+-0.01217 Training: 2022-04-11 01:31:18,266-[cfp_fp][64000]Accuracy-Highest: 0.94143 Training: 2022-04-11 01:31:39,390-[agedb_30][64000]XNorm: 12.434665 Training: 2022-04-11 01:31:39,390-[agedb_30][64000]Accuracy-Flip: 0.96083+-0.00895 Training: 2022-04-11 01:31:39,390-[agedb_30][64000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:31:40,305-Speed 151.02 samples/sec Loss 7.5041 LearningRate 0.0516 Epoch: 11 Global Step: 64010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:31:41,250-Speed 10840.14 samples/sec Loss 7.6343 LearningRate 0.0516 Epoch: 11 Global Step: 64020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:31:42,180-Speed 11017.57 samples/sec Loss 7.6333 LearningRate 0.0516 Epoch: 11 Global Step: 64030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:31:43,094-Speed 11217.53 samples/sec Loss 7.4931 LearningRate 0.0516 Epoch: 11 Global Step: 64040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:31:44,022-Speed 11041.67 samples/sec Loss 7.7273 LearningRate 0.0516 Epoch: 11 Global Step: 64050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:31:44,935-Speed 11224.33 samples/sec Loss 7.4802 LearningRate 0.0516 Epoch: 11 Global Step: 64060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:31:45,873-Speed 10921.08 samples/sec Loss 7.5690 LearningRate 0.0516 Epoch: 11 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:46,785-Speed 11231.16 samples/sec Loss 7.4145 LearningRate 0.0516 Epoch: 11 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:47,712-Speed 11055.73 samples/sec Loss 7.4649 LearningRate 0.0516 Epoch: 11 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:48,633-Speed 11129.55 samples/sec Loss 7.3844 LearningRate 0.0516 Epoch: 11 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:49,543-Speed 11253.13 samples/sec Loss 7.5304 LearningRate 0.0516 Epoch: 11 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:50,471-Speed 11045.50 samples/sec Loss 7.6415 LearningRate 0.0516 Epoch: 11 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:51,442-Speed 10555.86 samples/sec Loss 7.5402 LearningRate 0.0516 Epoch: 11 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:52,390-Speed 10813.30 samples/sec Loss 7.5148 LearningRate 0.0516 Epoch: 11 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:53,363-Speed 10523.42 samples/sec Loss 7.6211 LearningRate 0.0515 Epoch: 11 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:54,307-Speed 10857.17 samples/sec Loss 7.3423 LearningRate 0.0515 Epoch: 11 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:55,216-Speed 11271.80 samples/sec Loss 7.5216 LearningRate 0.0515 Epoch: 11 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:56,162-Speed 10832.84 samples/sec Loss 7.7615 LearningRate 0.0515 Epoch: 11 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:57,083-Speed 11122.74 samples/sec Loss 7.3763 LearningRate 0.0515 Epoch: 11 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:57,970-Speed 11557.02 samples/sec Loss 7.5234 LearningRate 0.0515 Epoch: 11 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:58,908-Speed 10927.13 samples/sec Loss 7.6244 LearningRate 0.0515 Epoch: 11 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:31:59,837-Speed 11026.83 samples/sec Loss 7.5668 LearningRate 0.0515 Epoch: 11 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:00,738-Speed 11370.88 samples/sec Loss 7.5804 LearningRate 0.0515 Epoch: 11 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:01,646-Speed 11289.68 samples/sec Loss 7.5487 LearningRate 0.0515 Epoch: 11 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:02,552-Speed 11318.14 samples/sec Loss 7.5051 LearningRate 0.0515 Epoch: 11 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:03,486-Speed 10965.65 samples/sec Loss 7.5389 LearningRate 0.0515 Epoch: 11 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:04,400-Speed 11213.21 samples/sec Loss 7.4789 LearningRate 0.0515 Epoch: 11 Global Step: 64270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:05,313-Speed 11225.02 samples/sec Loss 7.4830 LearningRate 0.0515 Epoch: 11 Global Step: 64280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:06,196-Speed 11610.08 samples/sec Loss 7.4236 LearningRate 0.0515 Epoch: 11 Global Step: 64290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:07,134-Speed 10916.20 samples/sec Loss 7.7455 LearningRate 0.0515 Epoch: 11 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:08,052-Speed 11167.56 samples/sec Loss 7.4787 LearningRate 0.0514 Epoch: 11 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:08,974-Speed 11106.05 samples/sec Loss 7.6948 LearningRate 0.0514 Epoch: 11 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:09,907-Speed 10990.01 samples/sec Loss 7.5668 LearningRate 0.0514 Epoch: 11 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:10,830-Speed 11098.99 samples/sec Loss 7.5334 LearningRate 0.0514 Epoch: 11 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:11,736-Speed 11313.97 samples/sec Loss 7.4920 LearningRate 0.0514 Epoch: 11 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:12,715-Speed 10457.91 samples/sec Loss 7.6044 LearningRate 0.0514 Epoch: 11 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:13,672-Speed 10709.90 samples/sec Loss 7.4808 LearningRate 0.0514 Epoch: 11 Global Step: 64370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:32:14,605-Speed 10985.24 samples/sec Loss 7.4286 LearningRate 0.0514 Epoch: 11 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:15,508-Speed 11352.18 samples/sec Loss 7.4632 LearningRate 0.0514 Epoch: 11 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:16,426-Speed 11163.33 samples/sec Loss 7.4709 LearningRate 0.0514 Epoch: 11 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:17,370-Speed 10850.35 samples/sec Loss 7.6080 LearningRate 0.0514 Epoch: 11 Global Step: 64410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:18,326-Speed 10713.55 samples/sec Loss 7.5046 LearningRate 0.0514 Epoch: 11 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:19,259-Speed 10987.72 samples/sec Loss 7.5101 LearningRate 0.0514 Epoch: 11 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:20,223-Speed 10623.79 samples/sec Loss 7.5516 LearningRate 0.0514 Epoch: 11 Global Step: 64440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:21,148-Speed 11090.52 samples/sec Loss 7.5807 LearningRate 0.0514 Epoch: 11 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:22,065-Speed 11170.11 samples/sec Loss 7.6269 LearningRate 0.0513 Epoch: 11 Global Step: 64460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:22,985-Speed 11131.81 samples/sec Loss 7.5151 LearningRate 0.0513 Epoch: 11 Global Step: 64470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:23,871-Speed 11571.63 samples/sec Loss 7.6091 LearningRate 0.0513 Epoch: 11 Global Step: 64480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:24,785-Speed 11209.07 samples/sec Loss 7.6413 LearningRate 0.0513 Epoch: 11 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:25,703-Speed 11162.36 samples/sec Loss 7.4404 LearningRate 0.0513 Epoch: 11 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:26,605-Speed 11365.45 samples/sec Loss 7.6490 LearningRate 0.0513 Epoch: 11 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:27,528-Speed 11106.17 samples/sec Loss 7.5301 LearningRate 0.0513 Epoch: 11 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:28,459-Speed 11003.52 samples/sec Loss 7.6156 LearningRate 0.0513 Epoch: 11 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:29,388-Speed 11036.20 samples/sec Loss 7.3777 LearningRate 0.0513 Epoch: 11 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:30,344-Speed 10719.52 samples/sec Loss 7.7395 LearningRate 0.0513 Epoch: 11 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:31,274-Speed 11009.75 samples/sec Loss 7.5635 LearningRate 0.0513 Epoch: 11 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:32,212-Speed 10923.94 samples/sec Loss 7.5933 LearningRate 0.0513 Epoch: 11 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:33,128-Speed 11188.44 samples/sec Loss 7.6412 LearningRate 0.0513 Epoch: 11 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:34,055-Speed 11060.42 samples/sec Loss 7.6016 LearningRate 0.0513 Epoch: 11 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:35,027-Speed 10546.30 samples/sec Loss 7.6477 LearningRate 0.0513 Epoch: 11 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:35,939-Speed 11226.47 samples/sec Loss 7.4618 LearningRate 0.0513 Epoch: 11 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:36,894-Speed 10732.79 samples/sec Loss 7.5107 LearningRate 0.0512 Epoch: 11 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:37,792-Speed 11409.91 samples/sec Loss 7.5571 LearningRate 0.0512 Epoch: 11 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:38,749-Speed 10710.24 samples/sec Loss 7.5224 LearningRate 0.0512 Epoch: 11 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:39,702-Speed 10748.64 samples/sec Loss 7.5288 LearningRate 0.0512 Epoch: 11 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:40,635-Speed 10988.83 samples/sec Loss 7.4138 LearningRate 0.0512 Epoch: 11 Global Step: 64660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:41,605-Speed 10559.20 samples/sec Loss 7.4757 LearningRate 0.0512 Epoch: 11 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:42,568-Speed 10644.00 samples/sec Loss 7.6769 LearningRate 0.0512 Epoch: 11 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:43,487-Speed 11156.63 samples/sec Loss 7.6064 LearningRate 0.0512 Epoch: 11 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:44,375-Speed 11539.03 samples/sec Loss 7.4910 LearningRate 0.0512 Epoch: 11 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:45,267-Speed 11484.88 samples/sec Loss 7.6106 LearningRate 0.0512 Epoch: 11 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:46,187-Speed 11133.83 samples/sec Loss 7.5385 LearningRate 0.0512 Epoch: 11 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:47,100-Speed 11229.47 samples/sec Loss 7.4901 LearningRate 0.0512 Epoch: 11 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:48,021-Speed 11133.92 samples/sec Loss 7.5431 LearningRate 0.0512 Epoch: 11 Global Step: 64740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:48,960-Speed 10910.01 samples/sec Loss 7.4837 LearningRate 0.0512 Epoch: 11 Global Step: 64750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:49,904-Speed 10858.82 samples/sec Loss 7.5197 LearningRate 0.0512 Epoch: 11 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:50,852-Speed 10801.22 samples/sec Loss 7.5755 LearningRate 0.0512 Epoch: 11 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:51,788-Speed 10946.24 samples/sec Loss 7.3736 LearningRate 0.0511 Epoch: 11 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:52,723-Speed 10959.88 samples/sec Loss 7.4815 LearningRate 0.0511 Epoch: 11 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:53,653-Speed 11014.54 samples/sec Loss 7.5142 LearningRate 0.0511 Epoch: 11 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:32:54,600-Speed 10825.16 samples/sec Loss 7.5805 LearningRate 0.0511 Epoch: 11 Global Step: 64810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:55,520-Speed 11134.53 samples/sec Loss 7.6310 LearningRate 0.0511 Epoch: 11 Global Step: 64820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:56,439-Speed 11150.08 samples/sec Loss 7.6890 LearningRate 0.0511 Epoch: 11 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:57,368-Speed 11029.66 samples/sec Loss 7.6209 LearningRate 0.0511 Epoch: 11 Global Step: 64840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:58,316-Speed 10812.81 samples/sec Loss 7.5869 LearningRate 0.0511 Epoch: 11 Global Step: 64850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:32:59,208-Speed 11493.39 samples/sec Loss 7.6631 LearningRate 0.0511 Epoch: 11 Global Step: 64860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:00,149-Speed 10886.69 samples/sec Loss 7.5820 LearningRate 0.0511 Epoch: 11 Global Step: 64870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:01,092-Speed 10860.17 samples/sec Loss 7.5968 LearningRate 0.0511 Epoch: 11 Global Step: 64880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:02,024-Speed 10999.39 samples/sec Loss 7.6046 LearningRate 0.0511 Epoch: 11 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:02,943-Speed 11150.50 samples/sec Loss 7.6900 LearningRate 0.0511 Epoch: 11 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:03,871-Speed 11035.16 samples/sec Loss 7.5921 LearningRate 0.0511 Epoch: 11 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:04,797-Speed 11069.73 samples/sec Loss 7.4775 LearningRate 0.0511 Epoch: 11 Global Step: 64920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:05,743-Speed 10827.88 samples/sec Loss 7.5915 LearningRate 0.0511 Epoch: 11 Global Step: 64930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:06,675-Speed 10996.14 samples/sec Loss 7.5828 LearningRate 0.0510 Epoch: 11 Global Step: 64940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:07,610-Speed 10964.16 samples/sec Loss 7.6096 LearningRate 0.0510 Epoch: 11 Global Step: 64950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:08,565-Speed 10727.24 samples/sec Loss 7.6055 LearningRate 0.0510 Epoch: 11 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:09,486-Speed 11131.12 samples/sec Loss 7.4957 LearningRate 0.0510 Epoch: 11 Global Step: 64970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:10,401-Speed 11193.77 samples/sec Loss 7.6535 LearningRate 0.0510 Epoch: 11 Global Step: 64980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:11,332-Speed 11010.12 samples/sec Loss 7.5460 LearningRate 0.0510 Epoch: 11 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:12,256-Speed 11093.39 samples/sec Loss 7.5807 LearningRate 0.0510 Epoch: 11 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:13,196-Speed 10897.61 samples/sec Loss 7.5632 LearningRate 0.0510 Epoch: 11 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:14,141-Speed 10839.55 samples/sec Loss 7.6306 LearningRate 0.0510 Epoch: 11 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:15,064-Speed 11100.29 samples/sec Loss 7.5472 LearningRate 0.0510 Epoch: 11 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:16,027-Speed 10642.61 samples/sec Loss 7.6958 LearningRate 0.0510 Epoch: 11 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:16,959-Speed 11002.54 samples/sec Loss 7.5503 LearningRate 0.0510 Epoch: 11 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:17,892-Speed 10981.63 samples/sec Loss 7.5598 LearningRate 0.0510 Epoch: 11 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:18,880-Speed 10366.37 samples/sec Loss 7.6287 LearningRate 0.0510 Epoch: 11 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:19,797-Speed 11172.61 samples/sec Loss 7.5295 LearningRate 0.0510 Epoch: 11 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:20,748-Speed 10773.73 samples/sec Loss 7.4674 LearningRate 0.0510 Epoch: 11 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:21,655-Speed 11297.17 samples/sec Loss 7.7244 LearningRate 0.0509 Epoch: 11 Global Step: 65100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:22,590-Speed 10967.60 samples/sec Loss 7.6740 LearningRate 0.0509 Epoch: 11 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:23,506-Speed 11179.51 samples/sec Loss 7.5372 LearningRate 0.0509 Epoch: 11 Global Step: 65120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:24,444-Speed 10922.20 samples/sec Loss 7.5399 LearningRate 0.0509 Epoch: 11 Global Step: 65130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:25,367-Speed 11108.06 samples/sec Loss 7.4884 LearningRate 0.0509 Epoch: 11 Global Step: 65140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:26,264-Speed 11418.60 samples/sec Loss 7.4958 LearningRate 0.0509 Epoch: 11 Global Step: 65150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:27,214-Speed 10784.21 samples/sec Loss 7.5838 LearningRate 0.0509 Epoch: 11 Global Step: 65160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:28,182-Speed 10591.98 samples/sec Loss 7.5184 LearningRate 0.0509 Epoch: 11 Global Step: 65170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:29,104-Speed 11117.25 samples/sec Loss 7.6790 LearningRate 0.0509 Epoch: 11 Global Step: 65180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:30,022-Speed 11160.05 samples/sec Loss 7.6633 LearningRate 0.0509 Epoch: 11 Global Step: 65190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:30,922-Speed 11391.95 samples/sec Loss 7.5903 LearningRate 0.0509 Epoch: 11 Global Step: 65200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:31,836-Speed 11201.39 samples/sec Loss 7.5339 LearningRate 0.0509 Epoch: 11 Global Step: 65210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:32,747-Speed 11257.09 samples/sec Loss 7.5191 LearningRate 0.0509 Epoch: 11 Global Step: 65220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:33,644-Speed 11411.88 samples/sec Loss 7.6814 LearningRate 0.0509 Epoch: 11 Global Step: 65230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:34,561-Speed 11180.84 samples/sec Loss 7.6537 LearningRate 0.0509 Epoch: 11 Global Step: 65240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:35,508-Speed 10825.69 samples/sec Loss 7.6691 LearningRate 0.0509 Epoch: 11 Global Step: 65250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:36,435-Speed 11055.67 samples/sec Loss 7.7242 LearningRate 0.0508 Epoch: 11 Global Step: 65260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:37,408-Speed 10530.44 samples/sec Loss 7.5801 LearningRate 0.0508 Epoch: 11 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:38,358-Speed 10780.31 samples/sec Loss 7.5237 LearningRate 0.0508 Epoch: 11 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:39,309-Speed 10771.92 samples/sec Loss 7.4895 LearningRate 0.0508 Epoch: 11 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:40,253-Speed 10854.75 samples/sec Loss 7.5922 LearningRate 0.0508 Epoch: 11 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:41,187-Speed 10976.83 samples/sec Loss 7.5705 LearningRate 0.0508 Epoch: 11 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:42,154-Speed 10587.93 samples/sec Loss 7.5530 LearningRate 0.0508 Epoch: 11 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:43,087-Speed 10989.96 samples/sec Loss 7.5797 LearningRate 0.0508 Epoch: 11 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:44,030-Speed 10864.84 samples/sec Loss 7.5182 LearningRate 0.0508 Epoch: 11 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:44,948-Speed 11159.53 samples/sec Loss 7.6969 LearningRate 0.0508 Epoch: 11 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:45,884-Speed 10950.11 samples/sec Loss 7.5000 LearningRate 0.0508 Epoch: 11 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:46,823-Speed 10922.93 samples/sec Loss 7.7367 LearningRate 0.0508 Epoch: 11 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:47,753-Speed 11017.65 samples/sec Loss 7.5875 LearningRate 0.0508 Epoch: 11 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:48,705-Speed 10760.62 samples/sec Loss 7.7888 LearningRate 0.0508 Epoch: 11 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:49,658-Speed 10748.63 samples/sec Loss 7.5326 LearningRate 0.0508 Epoch: 11 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:50,614-Speed 10718.28 samples/sec Loss 7.6191 LearningRate 0.0508 Epoch: 11 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:51,530-Speed 11188.97 samples/sec Loss 7.6312 LearningRate 0.0507 Epoch: 11 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:52,468-Speed 10919.14 samples/sec Loss 7.5171 LearningRate 0.0507 Epoch: 11 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:33:53,424-Speed 10720.42 samples/sec Loss 7.5436 LearningRate 0.0507 Epoch: 11 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:54,334-Speed 11265.71 samples/sec Loss 7.5805 LearningRate 0.0507 Epoch: 11 Global Step: 65450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:55,252-Speed 11158.52 samples/sec Loss 7.5207 LearningRate 0.0507 Epoch: 11 Global Step: 65460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:56,200-Speed 10807.09 samples/sec Loss 7.6389 LearningRate 0.0507 Epoch: 11 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:57,097-Speed 11428.17 samples/sec Loss 7.5513 LearningRate 0.0507 Epoch: 11 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:58,030-Speed 10975.59 samples/sec Loss 7.4540 LearningRate 0.0507 Epoch: 11 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:58,973-Speed 10875.09 samples/sec Loss 7.4826 LearningRate 0.0507 Epoch: 11 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:33:59,863-Speed 11510.75 samples/sec Loss 7.5884 LearningRate 0.0507 Epoch: 11 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:00,799-Speed 10941.01 samples/sec Loss 7.6005 LearningRate 0.0507 Epoch: 11 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:01,733-Speed 10982.69 samples/sec Loss 7.5755 LearningRate 0.0507 Epoch: 11 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:02,658-Speed 11077.57 samples/sec Loss 7.7021 LearningRate 0.0507 Epoch: 11 Global Step: 65540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:03,596-Speed 10924.73 samples/sec Loss 7.5621 LearningRate 0.0507 Epoch: 11 Global Step: 65550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:04,502-Speed 11310.25 samples/sec Loss 7.6020 LearningRate 0.0507 Epoch: 11 Global Step: 65560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:05,442-Speed 10906.02 samples/sec Loss 7.5503 LearningRate 0.0507 Epoch: 11 Global Step: 65570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:06,403-Speed 10653.69 samples/sec Loss 7.5141 LearningRate 0.0506 Epoch: 11 Global Step: 65580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:07,337-Speed 10969.77 samples/sec Loss 7.3913 LearningRate 0.0506 Epoch: 11 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:08,306-Speed 10575.47 samples/sec Loss 7.5632 LearningRate 0.0506 Epoch: 11 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:09,256-Speed 10792.24 samples/sec Loss 7.6537 LearningRate 0.0506 Epoch: 11 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:10,185-Speed 11022.73 samples/sec Loss 7.5815 LearningRate 0.0506 Epoch: 11 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:11,164-Speed 10465.77 samples/sec Loss 7.6051 LearningRate 0.0506 Epoch: 11 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:12,123-Speed 10691.90 samples/sec Loss 7.6361 LearningRate 0.0506 Epoch: 11 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:13,083-Speed 10664.39 samples/sec Loss 7.6280 LearningRate 0.0506 Epoch: 11 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:14,021-Speed 10924.48 samples/sec Loss 7.5429 LearningRate 0.0506 Epoch: 11 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:15,008-Speed 10401.32 samples/sec Loss 7.7051 LearningRate 0.0506 Epoch: 11 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:15,961-Speed 10748.45 samples/sec Loss 7.5338 LearningRate 0.0506 Epoch: 11 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:16,914-Speed 10745.10 samples/sec Loss 7.6869 LearningRate 0.0506 Epoch: 11 Global Step: 65690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:17,856-Speed 10882.13 samples/sec Loss 7.7246 LearningRate 0.0506 Epoch: 11 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:18,801-Speed 10839.85 samples/sec Loss 7.6323 LearningRate 0.0506 Epoch: 11 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:19,774-Speed 10535.56 samples/sec Loss 7.5728 LearningRate 0.0506 Epoch: 11 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:20,724-Speed 10791.59 samples/sec Loss 7.5393 LearningRate 0.0506 Epoch: 11 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:21,654-Speed 11010.36 samples/sec Loss 7.6150 LearningRate 0.0505 Epoch: 11 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:22,565-Speed 11255.02 samples/sec Loss 7.5091 LearningRate 0.0505 Epoch: 11 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:23,495-Speed 11023.88 samples/sec Loss 7.5618 LearningRate 0.0505 Epoch: 11 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:24,415-Speed 11133.85 samples/sec Loss 7.6774 LearningRate 0.0505 Epoch: 11 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:25,343-Speed 11046.35 samples/sec Loss 7.6273 LearningRate 0.0505 Epoch: 11 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:26,240-Speed 11416.13 samples/sec Loss 7.7246 LearningRate 0.0505 Epoch: 11 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:27,163-Speed 11106.37 samples/sec Loss 7.6402 LearningRate 0.0505 Epoch: 11 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:28,063-Speed 11383.78 samples/sec Loss 7.5290 LearningRate 0.0505 Epoch: 11 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:28,995-Speed 10994.34 samples/sec Loss 7.6239 LearningRate 0.0505 Epoch: 11 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:29,923-Speed 11035.97 samples/sec Loss 7.6338 LearningRate 0.0505 Epoch: 11 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:30,840-Speed 11174.10 samples/sec Loss 7.6648 LearningRate 0.0505 Epoch: 11 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:31,775-Speed 10963.85 samples/sec Loss 7.6919 LearningRate 0.0505 Epoch: 11 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:32,733-Speed 10691.92 samples/sec Loss 7.6463 LearningRate 0.0505 Epoch: 11 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:33,659-Speed 11064.39 samples/sec Loss 7.4839 LearningRate 0.0505 Epoch: 11 Global Step: 65870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:34,607-Speed 10809.61 samples/sec Loss 7.3635 LearningRate 0.0505 Epoch: 11 Global Step: 65880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:35,515-Speed 11296.83 samples/sec Loss 7.6712 LearningRate 0.0505 Epoch: 11 Global Step: 65890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:36,462-Speed 10813.40 samples/sec Loss 7.5485 LearningRate 0.0504 Epoch: 11 Global Step: 65900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:37,492-Speed 9946.37 samples/sec Loss 7.5325 LearningRate 0.0504 Epoch: 11 Global Step: 65910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:38,424-Speed 11002.25 samples/sec Loss 7.5025 LearningRate 0.0504 Epoch: 11 Global Step: 65920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:34:39,427-Speed 10217.09 samples/sec Loss 7.4221 LearningRate 0.0504 Epoch: 11 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:40,371-Speed 10851.46 samples/sec Loss 7.5916 LearningRate 0.0504 Epoch: 11 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:41,326-Speed 10727.68 samples/sec Loss 7.5711 LearningRate 0.0504 Epoch: 11 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:42,290-Speed 10626.21 samples/sec Loss 7.4978 LearningRate 0.0504 Epoch: 11 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:43,212-Speed 11110.56 samples/sec Loss 7.6380 LearningRate 0.0504 Epoch: 11 Global Step: 65970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:44,143-Speed 11018.78 samples/sec Loss 7.5578 LearningRate 0.0504 Epoch: 11 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:45,038-Speed 11450.61 samples/sec Loss 7.6203 LearningRate 0.0504 Epoch: 11 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:34:45,940-Speed 11351.85 samples/sec Loss 7.5053 LearningRate 0.0504 Epoch: 11 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:35:07,479-[lfw][66000]XNorm: 12.598927 Training: 2022-04-11 01:35:07,480-[lfw][66000]Accuracy-Flip: 0.99500+-0.00298 Training: 2022-04-11 01:35:07,480-[lfw][66000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:35:32,235-[cfp_fp][66000]XNorm: 10.414550 Training: 2022-04-11 01:35:32,235-[cfp_fp][66000]Accuracy-Flip: 0.94114+-0.00896 Training: 2022-04-11 01:35:32,236-[cfp_fp][66000]Accuracy-Highest: 0.94143 Training: 2022-04-11 01:35:53,390-[agedb_30][66000]XNorm: 12.314395 Training: 2022-04-11 01:35:53,390-[agedb_30][66000]Accuracy-Flip: 0.95867+-0.00856 Training: 2022-04-11 01:35:53,390-[agedb_30][66000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:35:54,334-Speed 149.72 samples/sec Loss 7.4942 LearningRate 0.0504 Epoch: 11 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:35:55,271-Speed 10927.24 samples/sec Loss 7.5097 LearningRate 0.0504 Epoch: 11 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:35:56,209-Speed 10931.03 samples/sec Loss 7.6053 LearningRate 0.0504 Epoch: 11 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:35:57,099-Speed 11517.84 samples/sec Loss 7.5858 LearningRate 0.0504 Epoch: 11 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:35:58,047-Speed 10809.15 samples/sec Loss 7.5821 LearningRate 0.0504 Epoch: 11 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:35:58,952-Speed 11322.24 samples/sec Loss 7.4889 LearningRate 0.0503 Epoch: 11 Global Step: 66060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:35:59,904-Speed 10762.38 samples/sec Loss 7.4704 LearningRate 0.0503 Epoch: 11 Global Step: 66070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:00,815-Speed 11242.37 samples/sec Loss 7.4503 LearningRate 0.0503 Epoch: 11 Global Step: 66080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:01,752-Speed 10945.58 samples/sec Loss 7.6921 LearningRate 0.0503 Epoch: 11 Global Step: 66090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:02,678-Speed 11057.14 samples/sec Loss 7.4910 LearningRate 0.0503 Epoch: 11 Global Step: 66100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:03,583-Speed 11332.48 samples/sec Loss 7.6560 LearningRate 0.0503 Epoch: 11 Global Step: 66110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:04,516-Speed 10984.45 samples/sec Loss 7.6350 LearningRate 0.0503 Epoch: 11 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:05,470-Speed 10730.41 samples/sec Loss 7.6339 LearningRate 0.0503 Epoch: 11 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:06,408-Speed 10935.83 samples/sec Loss 7.5684 LearningRate 0.0503 Epoch: 11 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:07,351-Speed 10864.84 samples/sec Loss 7.4839 LearningRate 0.0503 Epoch: 11 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:08,269-Speed 11156.00 samples/sec Loss 7.6113 LearningRate 0.0503 Epoch: 11 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:09,179-Speed 11264.44 samples/sec Loss 7.7466 LearningRate 0.0503 Epoch: 11 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:10,083-Speed 11327.71 samples/sec Loss 7.6365 LearningRate 0.0503 Epoch: 11 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:10,975-Speed 11493.10 samples/sec Loss 7.4215 LearningRate 0.0503 Epoch: 11 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:11,952-Speed 10482.06 samples/sec Loss 7.5856 LearningRate 0.0503 Epoch: 11 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:12,925-Speed 10536.99 samples/sec Loss 7.6946 LearningRate 0.0503 Epoch: 11 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:13,889-Speed 10630.74 samples/sec Loss 7.5734 LearningRate 0.0502 Epoch: 11 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:14,812-Speed 11096.19 samples/sec Loss 7.5410 LearningRate 0.0502 Epoch: 11 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:15,747-Speed 10961.46 samples/sec Loss 7.5855 LearningRate 0.0502 Epoch: 11 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:16,706-Speed 10687.84 samples/sec Loss 7.5793 LearningRate 0.0502 Epoch: 11 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:17,636-Speed 11015.24 samples/sec Loss 7.5807 LearningRate 0.0502 Epoch: 11 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:18,615-Speed 10462.18 samples/sec Loss 7.6696 LearningRate 0.0502 Epoch: 11 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:19,580-Speed 10618.73 samples/sec Loss 7.4143 LearningRate 0.0502 Epoch: 11 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:20,507-Speed 11055.81 samples/sec Loss 7.6543 LearningRate 0.0502 Epoch: 11 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:21,420-Speed 11230.08 samples/sec Loss 7.5382 LearningRate 0.0502 Epoch: 11 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:22,301-Speed 11628.02 samples/sec Loss 7.4452 LearningRate 0.0502 Epoch: 11 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:23,241-Speed 10894.18 samples/sec Loss 7.6274 LearningRate 0.0502 Epoch: 11 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:24,189-Speed 10816.89 samples/sec Loss 7.5678 LearningRate 0.0502 Epoch: 11 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:25,117-Speed 11034.91 samples/sec Loss 7.5735 LearningRate 0.0502 Epoch: 11 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:26,044-Speed 11059.83 samples/sec Loss 7.4311 LearningRate 0.0502 Epoch: 11 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:26,983-Speed 10912.03 samples/sec Loss 7.4459 LearningRate 0.0502 Epoch: 11 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:27,867-Speed 11589.98 samples/sec Loss 7.5857 LearningRate 0.0502 Epoch: 11 Global Step: 66370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:36:28,805-Speed 10931.49 samples/sec Loss 7.4892 LearningRate 0.0501 Epoch: 11 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:29,763-Speed 10698.54 samples/sec Loss 7.6167 LearningRate 0.0501 Epoch: 11 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:30,688-Speed 11066.13 samples/sec Loss 7.4549 LearningRate 0.0501 Epoch: 11 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:31,644-Speed 10721.87 samples/sec Loss 7.5116 LearningRate 0.0501 Epoch: 11 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:32,583-Speed 10911.66 samples/sec Loss 7.6527 LearningRate 0.0501 Epoch: 11 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:33,524-Speed 10887.53 samples/sec Loss 7.5473 LearningRate 0.0501 Epoch: 11 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:34,435-Speed 11247.35 samples/sec Loss 7.5977 LearningRate 0.0501 Epoch: 11 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:35,355-Speed 11140.15 samples/sec Loss 7.7203 LearningRate 0.0501 Epoch: 11 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:36,285-Speed 11017.75 samples/sec Loss 7.5610 LearningRate 0.0501 Epoch: 11 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:37,198-Speed 11233.73 samples/sec Loss 7.6393 LearningRate 0.0501 Epoch: 11 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:38,103-Speed 11320.59 samples/sec Loss 7.4588 LearningRate 0.0501 Epoch: 11 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:39,054-Speed 10771.92 samples/sec Loss 7.4868 LearningRate 0.0501 Epoch: 11 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:40,036-Speed 10432.48 samples/sec Loss 7.6281 LearningRate 0.0501 Epoch: 11 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:40,960-Speed 11091.17 samples/sec Loss 7.7182 LearningRate 0.0501 Epoch: 11 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:41,933-Speed 10531.26 samples/sec Loss 7.7034 LearningRate 0.0501 Epoch: 11 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:42,904-Speed 10553.04 samples/sec Loss 7.5383 LearningRate 0.0501 Epoch: 11 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:43,843-Speed 10914.93 samples/sec Loss 7.4505 LearningRate 0.0500 Epoch: 11 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:44,769-Speed 11069.66 samples/sec Loss 7.5880 LearningRate 0.0500 Epoch: 11 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:45,665-Speed 11438.16 samples/sec Loss 7.3948 LearningRate 0.0500 Epoch: 11 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:46,558-Speed 11475.72 samples/sec Loss 7.4349 LearningRate 0.0500 Epoch: 11 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:47,490-Speed 10987.75 samples/sec Loss 7.4579 LearningRate 0.0500 Epoch: 11 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:36:48,433-Speed 10872.73 samples/sec Loss 7.5071 LearningRate 0.0500 Epoch: 11 Global Step: 66590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:49,341-Speed 11288.93 samples/sec Loss 7.5522 LearningRate 0.0500 Epoch: 11 Global Step: 66600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:50,271-Speed 11012.87 samples/sec Loss 7.7108 LearningRate 0.0500 Epoch: 11 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:51,206-Speed 10960.61 samples/sec Loss 7.6509 LearningRate 0.0500 Epoch: 11 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:52,149-Speed 10865.48 samples/sec Loss 7.4079 LearningRate 0.0500 Epoch: 11 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:53,055-Speed 11308.74 samples/sec Loss 7.5271 LearningRate 0.0500 Epoch: 11 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:53,956-Speed 11373.91 samples/sec Loss 7.5061 LearningRate 0.0500 Epoch: 11 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:54,879-Speed 11101.69 samples/sec Loss 7.5362 LearningRate 0.0500 Epoch: 11 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:55,799-Speed 11139.26 samples/sec Loss 7.5929 LearningRate 0.0500 Epoch: 11 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:56,739-Speed 10895.04 samples/sec Loss 7.5541 LearningRate 0.0500 Epoch: 11 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:57,658-Speed 11148.06 samples/sec Loss 7.5262 LearningRate 0.0500 Epoch: 11 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:58,607-Speed 10797.51 samples/sec Loss 7.5550 LearningRate 0.0499 Epoch: 11 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:36:59,567-Speed 10681.57 samples/sec Loss 7.5983 LearningRate 0.0499 Epoch: 11 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:00,493-Speed 11060.07 samples/sec Loss 7.5566 LearningRate 0.0499 Epoch: 11 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:01,497-Speed 10201.54 samples/sec Loss 7.5183 LearningRate 0.0499 Epoch: 11 Global Step: 66730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:02,438-Speed 10891.09 samples/sec Loss 7.5461 LearningRate 0.0499 Epoch: 11 Global Step: 66740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:03,383-Speed 10844.36 samples/sec Loss 7.6321 LearningRate 0.0499 Epoch: 11 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:04,340-Speed 10712.55 samples/sec Loss 7.5371 LearningRate 0.0499 Epoch: 11 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:05,275-Speed 10956.09 samples/sec Loss 7.6604 LearningRate 0.0499 Epoch: 11 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:06,236-Speed 10655.00 samples/sec Loss 7.6925 LearningRate 0.0499 Epoch: 11 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:07,169-Speed 10986.46 samples/sec Loss 7.5587 LearningRate 0.0499 Epoch: 11 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:08,101-Speed 10997.40 samples/sec Loss 7.6497 LearningRate 0.0499 Epoch: 11 Global Step: 66800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:09,062-Speed 10658.85 samples/sec Loss 7.6521 LearningRate 0.0499 Epoch: 11 Global Step: 66810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:09,979-Speed 11182.15 samples/sec Loss 7.5414 LearningRate 0.0499 Epoch: 11 Global Step: 66820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:10,929-Speed 10783.10 samples/sec Loss 7.5652 LearningRate 0.0499 Epoch: 11 Global Step: 66830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:11,818-Speed 11537.24 samples/sec Loss 7.6238 LearningRate 0.0499 Epoch: 11 Global Step: 66840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:12,775-Speed 10704.80 samples/sec Loss 7.7327 LearningRate 0.0499 Epoch: 11 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:13,712-Speed 10939.60 samples/sec Loss 7.4714 LearningRate 0.0498 Epoch: 11 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:14,647-Speed 10961.50 samples/sec Loss 7.4683 LearningRate 0.0498 Epoch: 11 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:15,554-Speed 11291.47 samples/sec Loss 7.5965 LearningRate 0.0498 Epoch: 11 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:16,463-Speed 11270.03 samples/sec Loss 7.5833 LearningRate 0.0498 Epoch: 11 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:17,414-Speed 10772.34 samples/sec Loss 7.5449 LearningRate 0.0498 Epoch: 11 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:18,341-Speed 11067.92 samples/sec Loss 7.6073 LearningRate 0.0498 Epoch: 11 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:19,272-Speed 11010.99 samples/sec Loss 7.4504 LearningRate 0.0498 Epoch: 11 Global Step: 66920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:20,194-Speed 11112.01 samples/sec Loss 7.4926 LearningRate 0.0498 Epoch: 11 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:21,095-Speed 11370.81 samples/sec Loss 7.3820 LearningRate 0.0498 Epoch: 11 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:21,993-Speed 11411.28 samples/sec Loss 7.6240 LearningRate 0.0498 Epoch: 11 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:22,926-Speed 10986.36 samples/sec Loss 7.6191 LearningRate 0.0498 Epoch: 11 Global Step: 66960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:23,837-Speed 11239.98 samples/sec Loss 7.5526 LearningRate 0.0498 Epoch: 11 Global Step: 66970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:24,786-Speed 10806.49 samples/sec Loss 7.6254 LearningRate 0.0498 Epoch: 11 Global Step: 66980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:25,733-Speed 10819.54 samples/sec Loss 7.6362 LearningRate 0.0498 Epoch: 11 Global Step: 66990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:26,675-Speed 10880.80 samples/sec Loss 7.5813 LearningRate 0.0498 Epoch: 11 Global Step: 67000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:27,600-Speed 11078.57 samples/sec Loss 7.5140 LearningRate 0.0498 Epoch: 11 Global Step: 67010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:28,515-Speed 11189.03 samples/sec Loss 7.6152 LearningRate 0.0497 Epoch: 11 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:29,466-Speed 10780.83 samples/sec Loss 7.5097 LearningRate 0.0497 Epoch: 11 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:30,382-Speed 11180.71 samples/sec Loss 7.7320 LearningRate 0.0497 Epoch: 11 Global Step: 67040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:31,377-Speed 10306.65 samples/sec Loss 7.5406 LearningRate 0.0497 Epoch: 11 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:32,316-Speed 10909.77 samples/sec Loss 7.5195 LearningRate 0.0497 Epoch: 11 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:33,213-Speed 11421.04 samples/sec Loss 7.6959 LearningRate 0.0497 Epoch: 11 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:34,105-Speed 11493.88 samples/sec Loss 7.7823 LearningRate 0.0497 Epoch: 11 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:35,018-Speed 11227.34 samples/sec Loss 7.5072 LearningRate 0.0497 Epoch: 11 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:35,924-Speed 11305.37 samples/sec Loss 7.6328 LearningRate 0.0497 Epoch: 11 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:36,819-Speed 11456.99 samples/sec Loss 7.6603 LearningRate 0.0497 Epoch: 11 Global Step: 67110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:37,735-Speed 11178.10 samples/sec Loss 7.5928 LearningRate 0.0497 Epoch: 11 Global Step: 67120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:38,635-Speed 11384.41 samples/sec Loss 7.5021 LearningRate 0.0497 Epoch: 11 Global Step: 67130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:39,560-Speed 11075.36 samples/sec Loss 7.6284 LearningRate 0.0497 Epoch: 11 Global Step: 67140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:40,530-Speed 10570.33 samples/sec Loss 7.4395 LearningRate 0.0497 Epoch: 11 Global Step: 67150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:41,488-Speed 10689.31 samples/sec Loss 7.6107 LearningRate 0.0497 Epoch: 11 Global Step: 67160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:42,396-Speed 11286.77 samples/sec Loss 7.5802 LearningRate 0.0497 Epoch: 11 Global Step: 67170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:37:43,324-Speed 11045.39 samples/sec Loss 7.5721 LearningRate 0.0496 Epoch: 11 Global Step: 67180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:44,240-Speed 11190.04 samples/sec Loss 7.5143 LearningRate 0.0496 Epoch: 11 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:45,194-Speed 10740.68 samples/sec Loss 7.5759 LearningRate 0.0496 Epoch: 11 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:46,135-Speed 10880.69 samples/sec Loss 7.6766 LearningRate 0.0496 Epoch: 11 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:47,089-Speed 10739.89 samples/sec Loss 7.3979 LearningRate 0.0496 Epoch: 11 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:48,003-Speed 11215.80 samples/sec Loss 7.5934 LearningRate 0.0496 Epoch: 11 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:48,929-Speed 11069.71 samples/sec Loss 7.5956 LearningRate 0.0496 Epoch: 11 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:49,875-Speed 10831.02 samples/sec Loss 7.7212 LearningRate 0.0496 Epoch: 11 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:50,872-Speed 10276.04 samples/sec Loss 7.6400 LearningRate 0.0496 Epoch: 11 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:51,787-Speed 11203.08 samples/sec Loss 7.6736 LearningRate 0.0496 Epoch: 11 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:37:52,703-Speed 11188.25 samples/sec Loss 7.5728 LearningRate 0.0496 Epoch: 11 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:53,637-Speed 10964.82 samples/sec Loss 7.6852 LearningRate 0.0496 Epoch: 11 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:54,569-Speed 10999.43 samples/sec Loss 7.4819 LearningRate 0.0496 Epoch: 11 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:55,529-Speed 10670.64 samples/sec Loss 7.4682 LearningRate 0.0496 Epoch: 11 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:56,442-Speed 11226.87 samples/sec Loss 7.6788 LearningRate 0.0496 Epoch: 11 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:57,348-Speed 11306.49 samples/sec Loss 7.5446 LearningRate 0.0496 Epoch: 11 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:58,280-Speed 10992.37 samples/sec Loss 7.5829 LearningRate 0.0496 Epoch: 11 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:37:59,215-Speed 10967.65 samples/sec Loss 7.4919 LearningRate 0.0495 Epoch: 11 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:00,142-Speed 11044.56 samples/sec Loss 7.5257 LearningRate 0.0495 Epoch: 11 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:01,113-Speed 10554.97 samples/sec Loss 7.6611 LearningRate 0.0495 Epoch: 11 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:02,030-Speed 11177.03 samples/sec Loss 7.5559 LearningRate 0.0495 Epoch: 11 Global Step: 67380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:02,941-Speed 11241.58 samples/sec Loss 7.6108 LearningRate 0.0495 Epoch: 11 Global Step: 67390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:03,848-Speed 11298.49 samples/sec Loss 7.5951 LearningRate 0.0495 Epoch: 11 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:04,734-Speed 11575.80 samples/sec Loss 7.3703 LearningRate 0.0495 Epoch: 11 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:05,672-Speed 10931.70 samples/sec Loss 7.5474 LearningRate 0.0495 Epoch: 11 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:06,598-Speed 11073.68 samples/sec Loss 7.4443 LearningRate 0.0495 Epoch: 11 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:07,485-Speed 11549.84 samples/sec Loss 7.5187 LearningRate 0.0495 Epoch: 11 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:08,389-Speed 11335.42 samples/sec Loss 7.7355 LearningRate 0.0495 Epoch: 11 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:09,309-Speed 11135.28 samples/sec Loss 7.5052 LearningRate 0.0495 Epoch: 11 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:10,243-Speed 10977.78 samples/sec Loss 7.6301 LearningRate 0.0495 Epoch: 11 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:11,139-Speed 11430.30 samples/sec Loss 7.6536 LearningRate 0.0495 Epoch: 11 Global Step: 67480 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:38:12,068-Speed 11027.83 samples/sec Loss 7.5004 LearningRate 0.0495 Epoch: 11 Global Step: 67490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:13,028-Speed 10675.72 samples/sec Loss 7.4095 LearningRate 0.0495 Epoch: 11 Global Step: 67500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:13,982-Speed 10740.42 samples/sec Loss 7.5479 LearningRate 0.0494 Epoch: 11 Global Step: 67510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:14,884-Speed 11367.82 samples/sec Loss 7.4274 LearningRate 0.0494 Epoch: 11 Global Step: 67520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:15,809-Speed 11074.82 samples/sec Loss 7.5961 LearningRate 0.0494 Epoch: 11 Global Step: 67530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:16,714-Speed 11322.16 samples/sec Loss 7.6740 LearningRate 0.0494 Epoch: 11 Global Step: 67540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:17,632-Speed 11162.76 samples/sec Loss 7.5654 LearningRate 0.0494 Epoch: 11 Global Step: 67550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:18,562-Speed 11012.96 samples/sec Loss 7.5677 LearningRate 0.0494 Epoch: 11 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:19,486-Speed 11091.02 samples/sec Loss 7.6286 LearningRate 0.0494 Epoch: 11 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:20,441-Speed 10734.14 samples/sec Loss 7.5514 LearningRate 0.0494 Epoch: 11 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:21,346-Speed 11318.10 samples/sec Loss 7.6508 LearningRate 0.0494 Epoch: 11 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:22,281-Speed 10964.76 samples/sec Loss 7.5316 LearningRate 0.0494 Epoch: 11 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:23,201-Speed 11136.25 samples/sec Loss 7.5100 LearningRate 0.0494 Epoch: 11 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:24,134-Speed 10977.64 samples/sec Loss 7.5826 LearningRate 0.0494 Epoch: 11 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:25,031-Speed 11438.33 samples/sec Loss 7.6057 LearningRate 0.0494 Epoch: 11 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:25,987-Speed 10720.65 samples/sec Loss 7.6106 LearningRate 0.0494 Epoch: 11 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:26,873-Speed 11568.68 samples/sec Loss 7.5214 LearningRate 0.0494 Epoch: 11 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:27,805-Speed 10989.81 samples/sec Loss 7.5059 LearningRate 0.0494 Epoch: 11 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:28,738-Speed 10988.95 samples/sec Loss 7.5898 LearningRate 0.0493 Epoch: 11 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:29,669-Speed 11008.78 samples/sec Loss 7.5311 LearningRate 0.0493 Epoch: 11 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:30,560-Speed 11494.87 samples/sec Loss 7.5934 LearningRate 0.0493 Epoch: 11 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:31,456-Speed 11438.17 samples/sec Loss 7.8086 LearningRate 0.0493 Epoch: 11 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:32,410-Speed 10739.46 samples/sec Loss 7.5234 LearningRate 0.0493 Epoch: 11 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:33,350-Speed 10895.11 samples/sec Loss 7.5376 LearningRate 0.0493 Epoch: 11 Global Step: 67720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:34,272-Speed 11114.65 samples/sec Loss 7.4949 LearningRate 0.0493 Epoch: 11 Global Step: 67730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:35,151-Speed 11660.37 samples/sec Loss 7.6443 LearningRate 0.0493 Epoch: 11 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:36,058-Speed 11300.53 samples/sec Loss 7.7305 LearningRate 0.0493 Epoch: 11 Global Step: 67750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:36,978-Speed 11135.52 samples/sec Loss 7.3535 LearningRate 0.0493 Epoch: 11 Global Step: 67760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:37,927-Speed 10797.01 samples/sec Loss 7.5907 LearningRate 0.0493 Epoch: 11 Global Step: 67770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:38,872-Speed 10842.82 samples/sec Loss 7.5119 LearningRate 0.0493 Epoch: 11 Global Step: 67780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:39,817-Speed 10842.65 samples/sec Loss 7.6917 LearningRate 0.0493 Epoch: 11 Global Step: 67790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:40,764-Speed 10821.45 samples/sec Loss 7.3433 LearningRate 0.0493 Epoch: 11 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:41,688-Speed 11093.13 samples/sec Loss 7.4834 LearningRate 0.0493 Epoch: 11 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:42,611-Speed 11098.13 samples/sec Loss 7.8056 LearningRate 0.0493 Epoch: 11 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:43,582-Speed 10556.01 samples/sec Loss 7.5591 LearningRate 0.0492 Epoch: 11 Global Step: 67830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:44,498-Speed 11188.10 samples/sec Loss 7.5407 LearningRate 0.0492 Epoch: 11 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:38:45,427-Speed 11027.67 samples/sec Loss 7.5862 LearningRate 0.0492 Epoch: 11 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:46,369-Speed 10878.66 samples/sec Loss 7.7063 LearningRate 0.0492 Epoch: 11 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:47,287-Speed 11162.32 samples/sec Loss 7.6186 LearningRate 0.0492 Epoch: 11 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:48,197-Speed 11253.74 samples/sec Loss 7.6917 LearningRate 0.0492 Epoch: 11 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:49,110-Speed 11225.65 samples/sec Loss 7.6249 LearningRate 0.0492 Epoch: 11 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:50,038-Speed 11048.84 samples/sec Loss 7.4660 LearningRate 0.0492 Epoch: 11 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:50,966-Speed 11036.93 samples/sec Loss 7.7508 LearningRate 0.0492 Epoch: 11 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:51,903-Speed 10937.95 samples/sec Loss 7.5952 LearningRate 0.0492 Epoch: 11 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:52,844-Speed 10896.83 samples/sec Loss 7.6286 LearningRate 0.0492 Epoch: 11 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:53,755-Speed 11243.64 samples/sec Loss 7.6236 LearningRate 0.0492 Epoch: 11 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:54,646-Speed 11499.05 samples/sec Loss 7.7193 LearningRate 0.0492 Epoch: 11 Global Step: 67950 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:38:55,557-Speed 11248.70 samples/sec Loss 7.5433 LearningRate 0.0492 Epoch: 11 Global Step: 67960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:56,494-Speed 10936.38 samples/sec Loss 7.7396 LearningRate 0.0492 Epoch: 11 Global Step: 67970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:57,411-Speed 11168.76 samples/sec Loss 7.5101 LearningRate 0.0492 Epoch: 11 Global Step: 67980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:58,346-Speed 10960.92 samples/sec Loss 7.6568 LearningRate 0.0491 Epoch: 11 Global Step: 67990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:38:59,283-Speed 10939.60 samples/sec Loss 7.6559 LearningRate 0.0491 Epoch: 11 Global Step: 68000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:39:20,472-[lfw][68000]XNorm: 12.744066 Training: 2022-04-11 01:39:20,472-[lfw][68000]Accuracy-Flip: 0.99483+-0.00369 Training: 2022-04-11 01:39:20,473-[lfw][68000]Accuracy-Highest: 0.99550 Training: 2022-04-11 01:39:45,005-[cfp_fp][68000]XNorm: 10.503359 Training: 2022-04-11 01:39:45,005-[cfp_fp][68000]Accuracy-Flip: 0.93886+-0.01144 Training: 2022-04-11 01:39:45,005-[cfp_fp][68000]Accuracy-Highest: 0.94143 Training: 2022-04-11 01:40:06,255-[agedb_30][68000]XNorm: 12.417656 Training: 2022-04-11 01:40:06,255-[agedb_30][68000]Accuracy-Flip: 0.96083+-0.01091 Training: 2022-04-11 01:40:06,256-[agedb_30][68000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:40:07,184-Speed 150.81 samples/sec Loss 7.5850 LearningRate 0.0491 Epoch: 11 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:08,129-Speed 10847.05 samples/sec Loss 7.5229 LearningRate 0.0491 Epoch: 11 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:09,031-Speed 11359.37 samples/sec Loss 7.6159 LearningRate 0.0491 Epoch: 11 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:09,933-Speed 11361.97 samples/sec Loss 7.3866 LearningRate 0.0491 Epoch: 11 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:10,856-Speed 11108.43 samples/sec Loss 7.4460 LearningRate 0.0491 Epoch: 11 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:11,787-Speed 10998.88 samples/sec Loss 7.5514 LearningRate 0.0491 Epoch: 11 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:12,727-Speed 10907.95 samples/sec Loss 7.5328 LearningRate 0.0491 Epoch: 11 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:13,655-Speed 11048.32 samples/sec Loss 7.4846 LearningRate 0.0491 Epoch: 11 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:14,620-Speed 10614.93 samples/sec Loss 7.7249 LearningRate 0.0491 Epoch: 11 Global Step: 68090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:15,555-Speed 10965.19 samples/sec Loss 7.3637 LearningRate 0.0491 Epoch: 11 Global Step: 68100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:16,478-Speed 11103.26 samples/sec Loss 7.6290 LearningRate 0.0491 Epoch: 11 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:17,428-Speed 10788.20 samples/sec Loss 7.5840 LearningRate 0.0491 Epoch: 11 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:18,389-Speed 10664.94 samples/sec Loss 7.5401 LearningRate 0.0491 Epoch: 11 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:19,334-Speed 10838.30 samples/sec Loss 7.4985 LearningRate 0.0491 Epoch: 11 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:20,263-Speed 11029.69 samples/sec Loss 7.5262 LearningRate 0.0491 Epoch: 11 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:21,152-Speed 11522.71 samples/sec Loss 7.6522 LearningRate 0.0490 Epoch: 11 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:22,067-Speed 11201.39 samples/sec Loss 7.3539 LearningRate 0.0490 Epoch: 11 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:22,972-Speed 11326.82 samples/sec Loss 7.5816 LearningRate 0.0490 Epoch: 11 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:23,897-Speed 11078.28 samples/sec Loss 7.4507 LearningRate 0.0490 Epoch: 11 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:24,794-Speed 11421.33 samples/sec Loss 7.5872 LearningRate 0.0490 Epoch: 11 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:25,710-Speed 11194.51 samples/sec Loss 7.7651 LearningRate 0.0490 Epoch: 11 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:26,638-Speed 11040.93 samples/sec Loss 7.5943 LearningRate 0.0490 Epoch: 11 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:27,578-Speed 10897.84 samples/sec Loss 7.6456 LearningRate 0.0490 Epoch: 11 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:40,135-Speed 815.55 samples/sec Loss 7.0023 LearningRate 0.0490 Epoch: 12 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:41,556-Speed 7214.39 samples/sec Loss 6.7316 LearningRate 0.0490 Epoch: 12 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:42,619-Speed 9644.94 samples/sec Loss 6.8281 LearningRate 0.0490 Epoch: 12 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:43,644-Speed 10002.08 samples/sec Loss 6.7940 LearningRate 0.0490 Epoch: 12 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:44,637-Speed 10315.51 samples/sec Loss 6.8091 LearningRate 0.0490 Epoch: 12 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:45,631-Speed 10309.03 samples/sec Loss 6.7310 LearningRate 0.0490 Epoch: 12 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:46,613-Speed 10437.27 samples/sec Loss 6.8503 LearningRate 0.0490 Epoch: 12 Global Step: 68300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:47,563-Speed 10793.43 samples/sec Loss 6.8538 LearningRate 0.0490 Epoch: 12 Global Step: 68310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:48,507-Speed 10863.03 samples/sec Loss 6.9151 LearningRate 0.0489 Epoch: 12 Global Step: 68320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:49,439-Speed 11003.53 samples/sec Loss 6.8030 LearningRate 0.0489 Epoch: 12 Global Step: 68330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:50,410-Speed 10546.43 samples/sec Loss 6.8983 LearningRate 0.0489 Epoch: 12 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:51,367-Speed 10713.79 samples/sec Loss 6.8041 LearningRate 0.0489 Epoch: 12 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:52,362-Speed 10297.54 samples/sec Loss 6.8915 LearningRate 0.0489 Epoch: 12 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:53,317-Speed 10730.87 samples/sec Loss 7.0774 LearningRate 0.0489 Epoch: 12 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:54,301-Speed 10412.39 samples/sec Loss 6.8947 LearningRate 0.0489 Epoch: 12 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:55,244-Speed 10867.09 samples/sec Loss 7.0101 LearningRate 0.0489 Epoch: 12 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:40:56,173-Speed 11033.56 samples/sec Loss 6.8789 LearningRate 0.0489 Epoch: 12 Global Step: 68400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:57,108-Speed 10954.75 samples/sec Loss 7.0582 LearningRate 0.0489 Epoch: 12 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:57,992-Speed 11595.31 samples/sec Loss 6.9957 LearningRate 0.0489 Epoch: 12 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:58,977-Speed 10405.61 samples/sec Loss 6.8654 LearningRate 0.0489 Epoch: 12 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:40:59,901-Speed 11105.48 samples/sec Loss 7.0828 LearningRate 0.0489 Epoch: 12 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:00,823-Speed 11109.82 samples/sec Loss 6.8036 LearningRate 0.0489 Epoch: 12 Global Step: 68450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:01,771-Speed 10807.24 samples/sec Loss 6.9735 LearningRate 0.0489 Epoch: 12 Global Step: 68460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:02,755-Speed 10417.47 samples/sec Loss 6.9058 LearningRate 0.0489 Epoch: 12 Global Step: 68470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:03,688-Speed 10981.04 samples/sec Loss 6.9311 LearningRate 0.0488 Epoch: 12 Global Step: 68480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:04,591-Speed 11346.92 samples/sec Loss 6.9477 LearningRate 0.0488 Epoch: 12 Global Step: 68490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:05,528-Speed 10942.23 samples/sec Loss 6.8471 LearningRate 0.0488 Epoch: 12 Global Step: 68500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:06,475-Speed 10818.34 samples/sec Loss 7.0427 LearningRate 0.0488 Epoch: 12 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:07,418-Speed 10865.90 samples/sec Loss 6.9038 LearningRate 0.0488 Epoch: 12 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:08,346-Speed 11046.86 samples/sec Loss 6.8757 LearningRate 0.0488 Epoch: 12 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:09,274-Speed 11052.40 samples/sec Loss 7.0006 LearningRate 0.0488 Epoch: 12 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:10,193-Speed 11142.61 samples/sec Loss 6.9761 LearningRate 0.0488 Epoch: 12 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:11,130-Speed 10935.34 samples/sec Loss 7.0812 LearningRate 0.0488 Epoch: 12 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:12,083-Speed 10760.29 samples/sec Loss 6.8872 LearningRate 0.0488 Epoch: 12 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:13,040-Speed 10706.08 samples/sec Loss 7.1051 LearningRate 0.0488 Epoch: 12 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:13,999-Speed 10687.71 samples/sec Loss 7.1054 LearningRate 0.0488 Epoch: 12 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:14,929-Speed 11020.19 samples/sec Loss 7.0530 LearningRate 0.0488 Epoch: 12 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:15,856-Speed 11053.66 samples/sec Loss 7.0880 LearningRate 0.0488 Epoch: 12 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:16,792-Speed 10957.57 samples/sec Loss 7.0525 LearningRate 0.0488 Epoch: 12 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:17,742-Speed 10782.98 samples/sec Loss 7.0626 LearningRate 0.0488 Epoch: 12 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:18,663-Speed 11132.06 samples/sec Loss 7.2275 LearningRate 0.0487 Epoch: 12 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:19,610-Speed 10825.58 samples/sec Loss 6.9704 LearningRate 0.0487 Epoch: 12 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:20,502-Speed 11488.92 samples/sec Loss 7.1254 LearningRate 0.0487 Epoch: 12 Global Step: 68660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:21,461-Speed 10678.46 samples/sec Loss 7.2292 LearningRate 0.0487 Epoch: 12 Global Step: 68670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:22,375-Speed 11210.99 samples/sec Loss 7.1436 LearningRate 0.0487 Epoch: 12 Global Step: 68680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:23,295-Speed 11182.57 samples/sec Loss 7.1479 LearningRate 0.0487 Epoch: 12 Global Step: 68690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:24,360-Speed 9627.12 samples/sec Loss 7.2029 LearningRate 0.0487 Epoch: 12 Global Step: 68700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:25,292-Speed 10995.22 samples/sec Loss 7.1158 LearningRate 0.0487 Epoch: 12 Global Step: 68710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:26,262-Speed 10568.53 samples/sec Loss 7.1326 LearningRate 0.0487 Epoch: 12 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:27,256-Speed 10308.86 samples/sec Loss 6.9221 LearningRate 0.0487 Epoch: 12 Global Step: 68730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:28,191-Speed 10959.96 samples/sec Loss 6.9870 LearningRate 0.0487 Epoch: 12 Global Step: 68740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:29,154-Speed 10649.17 samples/sec Loss 7.1461 LearningRate 0.0487 Epoch: 12 Global Step: 68750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:30,091-Speed 10935.50 samples/sec Loss 7.1812 LearningRate 0.0487 Epoch: 12 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:31,106-Speed 10101.10 samples/sec Loss 7.0755 LearningRate 0.0487 Epoch: 12 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:32,055-Speed 10799.61 samples/sec Loss 7.2296 LearningRate 0.0487 Epoch: 12 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:33,016-Speed 10655.19 samples/sec Loss 7.1094 LearningRate 0.0487 Epoch: 12 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:33,994-Speed 10481.65 samples/sec Loss 7.1425 LearningRate 0.0487 Epoch: 12 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:34,914-Speed 11149.50 samples/sec Loss 7.2629 LearningRate 0.0486 Epoch: 12 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:35,845-Speed 11008.30 samples/sec Loss 7.0920 LearningRate 0.0486 Epoch: 12 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:36,865-Speed 10041.38 samples/sec Loss 7.0802 LearningRate 0.0486 Epoch: 12 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:37,930-Speed 9627.94 samples/sec Loss 7.1904 LearningRate 0.0486 Epoch: 12 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:38,943-Speed 10112.34 samples/sec Loss 7.1748 LearningRate 0.0486 Epoch: 12 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:41:39,931-Speed 10375.94 samples/sec Loss 7.1699 LearningRate 0.0486 Epoch: 12 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:40,854-Speed 11102.04 samples/sec Loss 7.2041 LearningRate 0.0486 Epoch: 12 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:41,785-Speed 11007.52 samples/sec Loss 7.1469 LearningRate 0.0486 Epoch: 12 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:42,739-Speed 10744.07 samples/sec Loss 7.3218 LearningRate 0.0486 Epoch: 12 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:43,668-Speed 11036.39 samples/sec Loss 7.1145 LearningRate 0.0486 Epoch: 12 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:44,576-Speed 11283.25 samples/sec Loss 7.1203 LearningRate 0.0486 Epoch: 12 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:45,534-Speed 10698.97 samples/sec Loss 7.1999 LearningRate 0.0486 Epoch: 12 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:46,424-Speed 11517.81 samples/sec Loss 7.1636 LearningRate 0.0486 Epoch: 12 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:47,357-Speed 10986.80 samples/sec Loss 7.1521 LearningRate 0.0486 Epoch: 12 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:48,324-Speed 10603.02 samples/sec Loss 7.1397 LearningRate 0.0486 Epoch: 12 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:49,307-Speed 10437.05 samples/sec Loss 7.2496 LearningRate 0.0486 Epoch: 12 Global Step: 68960 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:41:50,234-Speed 11051.46 samples/sec Loss 7.1941 LearningRate 0.0485 Epoch: 12 Global Step: 68970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:51,184-Speed 10789.53 samples/sec Loss 7.1805 LearningRate 0.0485 Epoch: 12 Global Step: 68980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:52,170-Speed 10394.17 samples/sec Loss 7.1923 LearningRate 0.0485 Epoch: 12 Global Step: 68990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:53,061-Speed 11500.15 samples/sec Loss 7.1391 LearningRate 0.0485 Epoch: 12 Global Step: 69000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:54,049-Speed 10368.96 samples/sec Loss 7.2734 LearningRate 0.0485 Epoch: 12 Global Step: 69010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:54,967-Speed 11168.01 samples/sec Loss 7.1812 LearningRate 0.0485 Epoch: 12 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:55,927-Speed 10678.74 samples/sec Loss 7.2036 LearningRate 0.0485 Epoch: 12 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:56,882-Speed 10730.91 samples/sec Loss 7.2635 LearningRate 0.0485 Epoch: 12 Global Step: 69040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:57,796-Speed 11221.79 samples/sec Loss 7.2801 LearningRate 0.0485 Epoch: 12 Global Step: 69050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:58,702-Speed 11310.69 samples/sec Loss 7.2247 LearningRate 0.0485 Epoch: 12 Global Step: 69060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:41:59,615-Speed 11231.75 samples/sec Loss 7.1319 LearningRate 0.0485 Epoch: 12 Global Step: 69070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:00,562-Speed 10823.24 samples/sec Loss 7.2618 LearningRate 0.0485 Epoch: 12 Global Step: 69080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:01,505-Speed 10870.00 samples/sec Loss 7.3331 LearningRate 0.0485 Epoch: 12 Global Step: 69090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:02,442-Speed 10942.69 samples/sec Loss 7.2459 LearningRate 0.0485 Epoch: 12 Global Step: 69100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:03,371-Speed 11033.02 samples/sec Loss 7.2872 LearningRate 0.0485 Epoch: 12 Global Step: 69110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:04,348-Speed 10487.26 samples/sec Loss 7.3490 LearningRate 0.0485 Epoch: 12 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:05,328-Speed 10457.78 samples/sec Loss 7.3355 LearningRate 0.0484 Epoch: 12 Global Step: 69130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:06,220-Speed 11484.08 samples/sec Loss 7.2433 LearningRate 0.0484 Epoch: 12 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:07,161-Speed 10903.28 samples/sec Loss 7.1489 LearningRate 0.0484 Epoch: 12 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:08,108-Speed 10815.16 samples/sec Loss 7.1884 LearningRate 0.0484 Epoch: 12 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:09,044-Speed 10956.83 samples/sec Loss 7.2622 LearningRate 0.0484 Epoch: 12 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:09,946-Speed 11370.71 samples/sec Loss 7.2304 LearningRate 0.0484 Epoch: 12 Global Step: 69180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:10,859-Speed 11217.21 samples/sec Loss 7.2163 LearningRate 0.0484 Epoch: 12 Global Step: 69190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:11,823-Speed 10636.44 samples/sec Loss 7.1463 LearningRate 0.0484 Epoch: 12 Global Step: 69200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:12,797-Speed 10525.37 samples/sec Loss 7.3494 LearningRate 0.0484 Epoch: 12 Global Step: 69210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:13,719-Speed 11112.16 samples/sec Loss 7.2873 LearningRate 0.0484 Epoch: 12 Global Step: 69220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:14,676-Speed 10712.19 samples/sec Loss 7.2709 LearningRate 0.0484 Epoch: 12 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:15,630-Speed 10743.36 samples/sec Loss 7.2453 LearningRate 0.0484 Epoch: 12 Global Step: 69240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:16,550-Speed 11135.66 samples/sec Loss 7.2567 LearningRate 0.0484 Epoch: 12 Global Step: 69250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:17,553-Speed 10217.79 samples/sec Loss 7.1453 LearningRate 0.0484 Epoch: 12 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:18,487-Speed 10974.72 samples/sec Loss 7.2514 LearningRate 0.0484 Epoch: 12 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:19,417-Speed 11021.07 samples/sec Loss 7.2183 LearningRate 0.0484 Epoch: 12 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:20,357-Speed 10911.60 samples/sec Loss 7.2387 LearningRate 0.0484 Epoch: 12 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:21,299-Speed 10878.53 samples/sec Loss 7.3146 LearningRate 0.0483 Epoch: 12 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:22,199-Speed 11391.81 samples/sec Loss 7.3936 LearningRate 0.0483 Epoch: 12 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:23,119-Speed 11133.29 samples/sec Loss 7.3771 LearningRate 0.0483 Epoch: 12 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:24,024-Speed 11329.06 samples/sec Loss 7.3486 LearningRate 0.0483 Epoch: 12 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:24,913-Speed 11521.33 samples/sec Loss 7.1934 LearningRate 0.0483 Epoch: 12 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:25,921-Speed 10167.77 samples/sec Loss 7.2839 LearningRate 0.0483 Epoch: 12 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:26,862-Speed 10887.22 samples/sec Loss 7.2705 LearningRate 0.0483 Epoch: 12 Global Step: 69360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:27,829-Speed 10605.60 samples/sec Loss 7.2334 LearningRate 0.0483 Epoch: 12 Global Step: 69370 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:42:28,756-Speed 11053.34 samples/sec Loss 7.3356 LearningRate 0.0483 Epoch: 12 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:29,721-Speed 10619.99 samples/sec Loss 7.3755 LearningRate 0.0483 Epoch: 12 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:30,644-Speed 11123.94 samples/sec Loss 7.2066 LearningRate 0.0483 Epoch: 12 Global Step: 69400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:31,540-Speed 11438.00 samples/sec Loss 7.2933 LearningRate 0.0483 Epoch: 12 Global Step: 69410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:32,462-Speed 11105.92 samples/sec Loss 7.4340 LearningRate 0.0483 Epoch: 12 Global Step: 69420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:33,360-Speed 11410.83 samples/sec Loss 7.3587 LearningRate 0.0483 Epoch: 12 Global Step: 69430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:34,302-Speed 10887.07 samples/sec Loss 7.2817 LearningRate 0.0483 Epoch: 12 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:35,203-Speed 11373.53 samples/sec Loss 7.4012 LearningRate 0.0483 Epoch: 12 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:36,113-Speed 11268.92 samples/sec Loss 7.2431 LearningRate 0.0482 Epoch: 12 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:37,045-Speed 10986.00 samples/sec Loss 7.1949 LearningRate 0.0482 Epoch: 12 Global Step: 69470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:37,958-Speed 11229.57 samples/sec Loss 7.3469 LearningRate 0.0482 Epoch: 12 Global Step: 69480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:38,914-Speed 10717.36 samples/sec Loss 7.3350 LearningRate 0.0482 Epoch: 12 Global Step: 69490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:39,874-Speed 10669.72 samples/sec Loss 7.1372 LearningRate 0.0482 Epoch: 12 Global Step: 69500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:40,795-Speed 11125.95 samples/sec Loss 7.2956 LearningRate 0.0482 Epoch: 12 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:41,721-Speed 11072.61 samples/sec Loss 7.3031 LearningRate 0.0482 Epoch: 12 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:42,634-Speed 11224.68 samples/sec Loss 7.1754 LearningRate 0.0482 Epoch: 12 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:43,540-Speed 11319.44 samples/sec Loss 7.4200 LearningRate 0.0482 Epoch: 12 Global Step: 69540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:44,504-Speed 10626.90 samples/sec Loss 7.4002 LearningRate 0.0482 Epoch: 12 Global Step: 69550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:45,422-Speed 11168.99 samples/sec Loss 7.3427 LearningRate 0.0482 Epoch: 12 Global Step: 69560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:46,321-Speed 11386.96 samples/sec Loss 7.2640 LearningRate 0.0482 Epoch: 12 Global Step: 69570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:47,223-Speed 11375.13 samples/sec Loss 7.4182 LearningRate 0.0482 Epoch: 12 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:48,158-Speed 10955.16 samples/sec Loss 7.3279 LearningRate 0.0482 Epoch: 12 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:49,079-Speed 11127.95 samples/sec Loss 7.5472 LearningRate 0.0482 Epoch: 12 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:50,059-Speed 10456.87 samples/sec Loss 7.4286 LearningRate 0.0482 Epoch: 12 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:50,970-Speed 11250.08 samples/sec Loss 7.4896 LearningRate 0.0481 Epoch: 12 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:51,880-Speed 11273.36 samples/sec Loss 7.2451 LearningRate 0.0481 Epoch: 12 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:52,807-Speed 11051.57 samples/sec Loss 7.4166 LearningRate 0.0481 Epoch: 12 Global Step: 69640 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:42:53,764-Speed 10709.81 samples/sec Loss 7.2973 LearningRate 0.0481 Epoch: 12 Global Step: 69650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:54,682-Speed 11168.00 samples/sec Loss 7.4387 LearningRate 0.0481 Epoch: 12 Global Step: 69660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:55,586-Speed 11338.29 samples/sec Loss 7.3516 LearningRate 0.0481 Epoch: 12 Global Step: 69670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:56,489-Speed 11350.16 samples/sec Loss 7.3781 LearningRate 0.0481 Epoch: 12 Global Step: 69680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:57,400-Speed 11243.36 samples/sec Loss 7.2887 LearningRate 0.0481 Epoch: 12 Global Step: 69690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:42:58,360-Speed 10674.53 samples/sec Loss 7.3460 LearningRate 0.0481 Epoch: 12 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:42:59,249-Speed 11539.13 samples/sec Loss 7.2244 LearningRate 0.0481 Epoch: 12 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:00,171-Speed 11113.30 samples/sec Loss 7.3414 LearningRate 0.0481 Epoch: 12 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:01,104-Speed 10974.69 samples/sec Loss 7.3108 LearningRate 0.0481 Epoch: 12 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:02,030-Speed 11068.09 samples/sec Loss 7.3127 LearningRate 0.0481 Epoch: 12 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:02,983-Speed 10756.63 samples/sec Loss 7.1896 LearningRate 0.0481 Epoch: 12 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:03,884-Speed 11378.12 samples/sec Loss 7.4369 LearningRate 0.0481 Epoch: 12 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:04,819-Speed 10953.62 samples/sec Loss 7.4270 LearningRate 0.0481 Epoch: 12 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:05,755-Speed 10955.81 samples/sec Loss 7.3623 LearningRate 0.0481 Epoch: 12 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:06,707-Speed 10765.74 samples/sec Loss 7.3656 LearningRate 0.0480 Epoch: 12 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:07,671-Speed 10635.95 samples/sec Loss 7.3980 LearningRate 0.0480 Epoch: 12 Global Step: 69800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:43:08,583-Speed 11240.57 samples/sec Loss 7.3098 LearningRate 0.0480 Epoch: 12 Global Step: 69810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:43:09,483-Speed 11381.20 samples/sec Loss 7.2838 LearningRate 0.0480 Epoch: 12 Global Step: 69820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:43:10,408-Speed 11082.90 samples/sec Loss 7.3651 LearningRate 0.0480 Epoch: 12 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:43:11,364-Speed 10727.26 samples/sec Loss 7.3343 LearningRate 0.0480 Epoch: 12 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:12,271-Speed 11300.29 samples/sec Loss 7.2592 LearningRate 0.0480 Epoch: 12 Global Step: 69850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:13,186-Speed 11194.97 samples/sec Loss 7.3585 LearningRate 0.0480 Epoch: 12 Global Step: 69860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:14,100-Speed 11216.88 samples/sec Loss 7.4071 LearningRate 0.0480 Epoch: 12 Global Step: 69870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:15,041-Speed 10885.41 samples/sec Loss 7.2838 LearningRate 0.0480 Epoch: 12 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:16,009-Speed 10591.67 samples/sec Loss 7.2598 LearningRate 0.0480 Epoch: 12 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:16,926-Speed 11174.79 samples/sec Loss 7.4281 LearningRate 0.0480 Epoch: 12 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:17,854-Speed 11049.80 samples/sec Loss 7.3035 LearningRate 0.0480 Epoch: 12 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:18,822-Speed 10586.08 samples/sec Loss 7.2690 LearningRate 0.0480 Epoch: 12 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:19,811-Speed 10359.08 samples/sec Loss 7.4552 LearningRate 0.0480 Epoch: 12 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:20,700-Speed 11536.39 samples/sec Loss 7.3311 LearningRate 0.0480 Epoch: 12 Global Step: 69940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:43:21,624-Speed 11091.67 samples/sec Loss 7.2826 LearningRate 0.0479 Epoch: 12 Global Step: 69950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:43:22,536-Speed 11237.70 samples/sec Loss 7.3440 LearningRate 0.0479 Epoch: 12 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:23,540-Speed 10208.79 samples/sec Loss 7.1762 LearningRate 0.0479 Epoch: 12 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:24,450-Speed 11265.22 samples/sec Loss 7.3657 LearningRate 0.0479 Epoch: 12 Global Step: 69980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:25,356-Speed 11312.88 samples/sec Loss 7.3263 LearningRate 0.0479 Epoch: 12 Global Step: 69990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:26,317-Speed 10659.24 samples/sec Loss 7.2781 LearningRate 0.0479 Epoch: 12 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:43:48,241-[lfw][70000]XNorm: 12.431487 Training: 2022-04-11 01:43:48,242-[lfw][70000]Accuracy-Flip: 0.99583+-0.00318 Training: 2022-04-11 01:43:48,242-[lfw][70000]Accuracy-Highest: 0.99583 Training: 2022-04-11 01:44:13,136-[cfp_fp][70000]XNorm: 10.403229 Training: 2022-04-11 01:44:13,137-[cfp_fp][70000]Accuracy-Flip: 0.93957+-0.01231 Training: 2022-04-11 01:44:13,137-[cfp_fp][70000]Accuracy-Highest: 0.94143 Training: 2022-04-11 01:44:34,486-[agedb_30][70000]XNorm: 12.104964 Training: 2022-04-11 01:44:34,487-[agedb_30][70000]Accuracy-Flip: 0.96017+-0.00797 Training: 2022-04-11 01:44:34,487-[agedb_30][70000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:44:35,418-Speed 148.19 samples/sec Loss 7.5193 LearningRate 0.0479 Epoch: 12 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:36,337-Speed 11150.30 samples/sec Loss 7.3420 LearningRate 0.0479 Epoch: 12 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:37,245-Speed 11284.94 samples/sec Loss 7.3717 LearningRate 0.0479 Epoch: 12 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:38,229-Speed 10424.82 samples/sec Loss 7.4078 LearningRate 0.0479 Epoch: 12 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:39,180-Speed 10774.83 samples/sec Loss 7.3322 LearningRate 0.0479 Epoch: 12 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:40,143-Speed 10642.98 samples/sec Loss 7.4172 LearningRate 0.0479 Epoch: 12 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:41,150-Speed 10169.63 samples/sec Loss 7.3011 LearningRate 0.0479 Epoch: 12 Global Step: 70070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:42,131-Speed 10455.87 samples/sec Loss 7.2421 LearningRate 0.0479 Epoch: 12 Global Step: 70080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:43,031-Speed 11386.65 samples/sec Loss 7.3718 LearningRate 0.0479 Epoch: 12 Global Step: 70090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:43,993-Speed 10657.74 samples/sec Loss 7.4676 LearningRate 0.0479 Epoch: 12 Global Step: 70100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:44,998-Speed 10193.50 samples/sec Loss 7.3884 LearningRate 0.0479 Epoch: 12 Global Step: 70110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:45,947-Speed 10803.29 samples/sec Loss 7.5021 LearningRate 0.0478 Epoch: 12 Global Step: 70120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:46,865-Speed 11163.49 samples/sec Loss 7.3530 LearningRate 0.0478 Epoch: 12 Global Step: 70130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:47,802-Speed 10937.74 samples/sec Loss 7.2915 LearningRate 0.0478 Epoch: 12 Global Step: 70140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:48,735-Speed 10993.49 samples/sec Loss 7.3253 LearningRate 0.0478 Epoch: 12 Global Step: 70150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:49,644-Speed 11274.46 samples/sec Loss 7.3203 LearningRate 0.0478 Epoch: 12 Global Step: 70160 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:44:50,584-Speed 10895.34 samples/sec Loss 7.3491 LearningRate 0.0478 Epoch: 12 Global Step: 70170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:51,535-Speed 10783.95 samples/sec Loss 7.5154 LearningRate 0.0478 Epoch: 12 Global Step: 70180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:52,469-Speed 10974.12 samples/sec Loss 7.4253 LearningRate 0.0478 Epoch: 12 Global Step: 70190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:53,397-Speed 11037.02 samples/sec Loss 7.4568 LearningRate 0.0478 Epoch: 12 Global Step: 70200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:54,326-Speed 11038.54 samples/sec Loss 7.4064 LearningRate 0.0478 Epoch: 12 Global Step: 70210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:55,261-Speed 10956.94 samples/sec Loss 7.4861 LearningRate 0.0478 Epoch: 12 Global Step: 70220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:56,172-Speed 11250.07 samples/sec Loss 7.3217 LearningRate 0.0478 Epoch: 12 Global Step: 70230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:44:57,077-Speed 11334.96 samples/sec Loss 7.4231 LearningRate 0.0478 Epoch: 12 Global Step: 70240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:57,993-Speed 11186.86 samples/sec Loss 7.3234 LearningRate 0.0478 Epoch: 12 Global Step: 70250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:58,925-Speed 10988.34 samples/sec Loss 7.2741 LearningRate 0.0478 Epoch: 12 Global Step: 70260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:44:59,843-Speed 11169.61 samples/sec Loss 7.4391 LearningRate 0.0478 Epoch: 12 Global Step: 70270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:00,769-Speed 11066.27 samples/sec Loss 7.3886 LearningRate 0.0477 Epoch: 12 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:01,708-Speed 10913.64 samples/sec Loss 7.3537 LearningRate 0.0477 Epoch: 12 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:02,634-Speed 11061.19 samples/sec Loss 7.3932 LearningRate 0.0477 Epoch: 12 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:03,562-Speed 11049.43 samples/sec Loss 7.3080 LearningRate 0.0477 Epoch: 12 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:04,479-Speed 11165.78 samples/sec Loss 7.3965 LearningRate 0.0477 Epoch: 12 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:05,447-Speed 10583.75 samples/sec Loss 7.4315 LearningRate 0.0477 Epoch: 12 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:06,419-Speed 10553.42 samples/sec Loss 7.5200 LearningRate 0.0477 Epoch: 12 Global Step: 70340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:07,334-Speed 11198.10 samples/sec Loss 7.3498 LearningRate 0.0477 Epoch: 12 Global Step: 70350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:08,229-Speed 11448.97 samples/sec Loss 7.5060 LearningRate 0.0477 Epoch: 12 Global Step: 70360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:09,129-Speed 11389.99 samples/sec Loss 7.4275 LearningRate 0.0477 Epoch: 12 Global Step: 70370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:10,020-Speed 11498.58 samples/sec Loss 7.1954 LearningRate 0.0477 Epoch: 12 Global Step: 70380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:10,943-Speed 11105.77 samples/sec Loss 7.4106 LearningRate 0.0477 Epoch: 12 Global Step: 70390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:11,835-Speed 11481.09 samples/sec Loss 7.2911 LearningRate 0.0477 Epoch: 12 Global Step: 70400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:12,781-Speed 10832.42 samples/sec Loss 7.3500 LearningRate 0.0477 Epoch: 12 Global Step: 70410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:13,686-Speed 11336.17 samples/sec Loss 7.3225 LearningRate 0.0477 Epoch: 12 Global Step: 70420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:14,585-Speed 11396.77 samples/sec Loss 7.2485 LearningRate 0.0477 Epoch: 12 Global Step: 70430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:15,517-Speed 10998.25 samples/sec Loss 7.4180 LearningRate 0.0477 Epoch: 12 Global Step: 70440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:16,471-Speed 10737.80 samples/sec Loss 7.2287 LearningRate 0.0476 Epoch: 12 Global Step: 70450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:17,399-Speed 11054.44 samples/sec Loss 7.4755 LearningRate 0.0476 Epoch: 12 Global Step: 70460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:18,281-Speed 11608.20 samples/sec Loss 7.3209 LearningRate 0.0476 Epoch: 12 Global Step: 70470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:19,237-Speed 10721.36 samples/sec Loss 7.3333 LearningRate 0.0476 Epoch: 12 Global Step: 70480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:20,182-Speed 10852.93 samples/sec Loss 7.2911 LearningRate 0.0476 Epoch: 12 Global Step: 70490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:21,083-Speed 11384.73 samples/sec Loss 7.5928 LearningRate 0.0476 Epoch: 12 Global Step: 70500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:21,978-Speed 11442.79 samples/sec Loss 7.2746 LearningRate 0.0476 Epoch: 12 Global Step: 70510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:22,889-Speed 11249.52 samples/sec Loss 7.3551 LearningRate 0.0476 Epoch: 12 Global Step: 70520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:23,791-Speed 11370.10 samples/sec Loss 7.4008 LearningRate 0.0476 Epoch: 12 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:24,727-Speed 10953.26 samples/sec Loss 7.4900 LearningRate 0.0476 Epoch: 12 Global Step: 70540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:25,615-Speed 11542.04 samples/sec Loss 7.4911 LearningRate 0.0476 Epoch: 12 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:26,537-Speed 11104.85 samples/sec Loss 7.3377 LearningRate 0.0476 Epoch: 12 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:27,513-Speed 10501.21 samples/sec Loss 7.3757 LearningRate 0.0476 Epoch: 12 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:28,471-Speed 10699.49 samples/sec Loss 7.4343 LearningRate 0.0476 Epoch: 12 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:29,412-Speed 10900.38 samples/sec Loss 7.3383 LearningRate 0.0476 Epoch: 12 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:30,322-Speed 11259.66 samples/sec Loss 7.2408 LearningRate 0.0476 Epoch: 12 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:31,329-Speed 10174.44 samples/sec Loss 7.5407 LearningRate 0.0475 Epoch: 12 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:32,311-Speed 10436.29 samples/sec Loss 7.4820 LearningRate 0.0475 Epoch: 12 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:33,196-Speed 11575.48 samples/sec Loss 7.4074 LearningRate 0.0475 Epoch: 12 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:34,094-Speed 11413.64 samples/sec Loss 7.3755 LearningRate 0.0475 Epoch: 12 Global Step: 70640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:35,032-Speed 10923.71 samples/sec Loss 7.4176 LearningRate 0.0475 Epoch: 12 Global Step: 70650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:35,971-Speed 10923.09 samples/sec Loss 7.3761 LearningRate 0.0475 Epoch: 12 Global Step: 70660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:36,883-Speed 11236.84 samples/sec Loss 7.3368 LearningRate 0.0475 Epoch: 12 Global Step: 70670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:37,802-Speed 11145.49 samples/sec Loss 7.4191 LearningRate 0.0475 Epoch: 12 Global Step: 70680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:38,697-Speed 11458.16 samples/sec Loss 7.3757 LearningRate 0.0475 Epoch: 12 Global Step: 70690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:39,629-Speed 10996.41 samples/sec Loss 7.4060 LearningRate 0.0475 Epoch: 12 Global Step: 70700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:40,626-Speed 10271.55 samples/sec Loss 7.4531 LearningRate 0.0475 Epoch: 12 Global Step: 70710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:41,553-Speed 11067.44 samples/sec Loss 7.4235 LearningRate 0.0475 Epoch: 12 Global Step: 70720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:42,444-Speed 11503.24 samples/sec Loss 7.4307 LearningRate 0.0475 Epoch: 12 Global Step: 70730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:43,372-Speed 11039.51 samples/sec Loss 7.3892 LearningRate 0.0475 Epoch: 12 Global Step: 70740 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:45:44,364-Speed 10330.31 samples/sec Loss 7.3100 LearningRate 0.0475 Epoch: 12 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:45,342-Speed 10488.36 samples/sec Loss 7.3172 LearningRate 0.0475 Epoch: 12 Global Step: 70760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:46,306-Speed 10621.38 samples/sec Loss 7.3063 LearningRate 0.0475 Epoch: 12 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:47,231-Speed 11080.98 samples/sec Loss 7.5291 LearningRate 0.0474 Epoch: 12 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:48,153-Speed 11114.41 samples/sec Loss 7.3938 LearningRate 0.0474 Epoch: 12 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:49,128-Speed 10521.50 samples/sec Loss 7.2639 LearningRate 0.0474 Epoch: 12 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:50,029-Speed 11373.31 samples/sec Loss 7.3935 LearningRate 0.0474 Epoch: 12 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:50,934-Speed 11331.40 samples/sec Loss 7.4683 LearningRate 0.0474 Epoch: 12 Global Step: 70820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:51,860-Speed 11071.32 samples/sec Loss 7.4040 LearningRate 0.0474 Epoch: 12 Global Step: 70830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:52,846-Speed 10387.76 samples/sec Loss 7.3959 LearningRate 0.0474 Epoch: 12 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:53,753-Speed 11306.98 samples/sec Loss 7.3123 LearningRate 0.0474 Epoch: 12 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:54,666-Speed 11229.27 samples/sec Loss 7.3960 LearningRate 0.0474 Epoch: 12 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:45:55,548-Speed 11607.88 samples/sec Loss 7.4178 LearningRate 0.0474 Epoch: 12 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:56,468-Speed 11143.26 samples/sec Loss 7.4037 LearningRate 0.0474 Epoch: 12 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:57,380-Speed 11234.43 samples/sec Loss 7.2774 LearningRate 0.0474 Epoch: 12 Global Step: 70890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:58,277-Speed 11423.23 samples/sec Loss 7.3994 LearningRate 0.0474 Epoch: 12 Global Step: 70900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:45:59,188-Speed 11253.08 samples/sec Loss 7.4446 LearningRate 0.0474 Epoch: 12 Global Step: 70910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:00,092-Speed 11339.10 samples/sec Loss 7.5482 LearningRate 0.0474 Epoch: 12 Global Step: 70920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:01,038-Speed 10837.53 samples/sec Loss 7.3492 LearningRate 0.0474 Epoch: 12 Global Step: 70930 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:01,980-Speed 10874.15 samples/sec Loss 7.4958 LearningRate 0.0473 Epoch: 12 Global Step: 70940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:02,938-Speed 10699.05 samples/sec Loss 7.5197 LearningRate 0.0473 Epoch: 12 Global Step: 70950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:03,884-Speed 10838.58 samples/sec Loss 7.4271 LearningRate 0.0473 Epoch: 12 Global Step: 70960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:04,837-Speed 10751.23 samples/sec Loss 7.4199 LearningRate 0.0473 Epoch: 12 Global Step: 70970 Fp16 Grad Scale: 262144 Required: 6 hours Training: 2022-04-11 01:46:05,778-Speed 10888.56 samples/sec Loss 7.3123 LearningRate 0.0473 Epoch: 12 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:06,766-Speed 10374.79 samples/sec Loss 7.2634 LearningRate 0.0473 Epoch: 12 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:07,661-Speed 11450.60 samples/sec Loss 7.2625 LearningRate 0.0473 Epoch: 12 Global Step: 71000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:08,584-Speed 11105.52 samples/sec Loss 7.3513 LearningRate 0.0473 Epoch: 12 Global Step: 71010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:09,475-Speed 11498.95 samples/sec Loss 7.6177 LearningRate 0.0473 Epoch: 12 Global Step: 71020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:10,371-Speed 11435.54 samples/sec Loss 7.3009 LearningRate 0.0473 Epoch: 12 Global Step: 71030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:11,287-Speed 11199.94 samples/sec Loss 7.4058 LearningRate 0.0473 Epoch: 12 Global Step: 71040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:12,181-Speed 11463.54 samples/sec Loss 7.5589 LearningRate 0.0473 Epoch: 12 Global Step: 71050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:13,112-Speed 11003.67 samples/sec Loss 7.3305 LearningRate 0.0473 Epoch: 12 Global Step: 71060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:14,016-Speed 11339.32 samples/sec Loss 7.5554 LearningRate 0.0473 Epoch: 12 Global Step: 71070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:14,935-Speed 11147.85 samples/sec Loss 7.4179 LearningRate 0.0473 Epoch: 12 Global Step: 71080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:15,888-Speed 10751.93 samples/sec Loss 7.4091 LearningRate 0.0473 Epoch: 12 Global Step: 71090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 01:46:16,839-Speed 10772.20 samples/sec Loss 7.4409 LearningRate 0.0473 Epoch: 12 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:17,783-Speed 10853.32 samples/sec Loss 7.3196 LearningRate 0.0472 Epoch: 12 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:18,766-Speed 10429.05 samples/sec Loss 7.2724 LearningRate 0.0472 Epoch: 12 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:19,694-Speed 11054.46 samples/sec Loss 7.3906 LearningRate 0.0472 Epoch: 12 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:20,646-Speed 10757.75 samples/sec Loss 7.6575 LearningRate 0.0472 Epoch: 12 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:21,584-Speed 10930.59 samples/sec Loss 7.4327 LearningRate 0.0472 Epoch: 12 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:22,609-Speed 9998.77 samples/sec Loss 7.3136 LearningRate 0.0472 Epoch: 12 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:23,550-Speed 10894.96 samples/sec Loss 7.4279 LearningRate 0.0472 Epoch: 12 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:24,502-Speed 10769.06 samples/sec Loss 7.4350 LearningRate 0.0472 Epoch: 12 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:25,403-Speed 11375.76 samples/sec Loss 7.2965 LearningRate 0.0472 Epoch: 12 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:26,340-Speed 10929.49 samples/sec Loss 7.4545 LearningRate 0.0472 Epoch: 12 Global Step: 71200 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:27,300-Speed 10678.63 samples/sec Loss 7.4283 LearningRate 0.0472 Epoch: 12 Global Step: 71210 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:28,246-Speed 10834.32 samples/sec Loss 7.1869 LearningRate 0.0472 Epoch: 12 Global Step: 71220 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:29,183-Speed 10944.24 samples/sec Loss 7.4808 LearningRate 0.0472 Epoch: 12 Global Step: 71230 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:30,082-Speed 11397.22 samples/sec Loss 7.3493 LearningRate 0.0472 Epoch: 12 Global Step: 71240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:31,031-Speed 10801.50 samples/sec Loss 7.4804 LearningRate 0.0472 Epoch: 12 Global Step: 71250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:31,979-Speed 10807.62 samples/sec Loss 7.4490 LearningRate 0.0472 Epoch: 12 Global Step: 71260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:32,885-Speed 11314.62 samples/sec Loss 7.3676 LearningRate 0.0471 Epoch: 12 Global Step: 71270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:33,784-Speed 11402.98 samples/sec Loss 7.5677 LearningRate 0.0471 Epoch: 12 Global Step: 71280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:34,721-Speed 10933.86 samples/sec Loss 7.5038 LearningRate 0.0471 Epoch: 12 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:35,687-Speed 10609.15 samples/sec Loss 7.5001 LearningRate 0.0471 Epoch: 12 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:36,589-Speed 11370.17 samples/sec Loss 7.4071 LearningRate 0.0471 Epoch: 12 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:37,563-Speed 10516.70 samples/sec Loss 7.3594 LearningRate 0.0471 Epoch: 12 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:38,456-Speed 11474.94 samples/sec Loss 7.4525 LearningRate 0.0471 Epoch: 12 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:39,421-Speed 10618.81 samples/sec Loss 7.3816 LearningRate 0.0471 Epoch: 12 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:40,378-Speed 10707.07 samples/sec Loss 7.4301 LearningRate 0.0471 Epoch: 12 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:41,323-Speed 10848.18 samples/sec Loss 7.4170 LearningRate 0.0471 Epoch: 12 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:42,271-Speed 10809.56 samples/sec Loss 7.3566 LearningRate 0.0471 Epoch: 12 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:43,173-Speed 11360.11 samples/sec Loss 7.6145 LearningRate 0.0471 Epoch: 12 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:44,197-Speed 10009.93 samples/sec Loss 7.3816 LearningRate 0.0471 Epoch: 12 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:45,099-Speed 11361.77 samples/sec Loss 7.3409 LearningRate 0.0471 Epoch: 12 Global Step: 71400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:46,038-Speed 10914.47 samples/sec Loss 7.3953 LearningRate 0.0471 Epoch: 12 Global Step: 71410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:46,929-Speed 11509.71 samples/sec Loss 7.4040 LearningRate 0.0471 Epoch: 12 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:47,908-Speed 10462.88 samples/sec Loss 7.3640 LearningRate 0.0471 Epoch: 12 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:48,832-Speed 11107.37 samples/sec Loss 7.3043 LearningRate 0.0470 Epoch: 12 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:49,751-Speed 11156.15 samples/sec Loss 7.3794 LearningRate 0.0470 Epoch: 12 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:50,638-Speed 11552.20 samples/sec Loss 7.3083 LearningRate 0.0470 Epoch: 12 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:46:51,588-Speed 10789.70 samples/sec Loss 7.4241 LearningRate 0.0470 Epoch: 12 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:52,522-Speed 10970.73 samples/sec Loss 7.5663 LearningRate 0.0470 Epoch: 12 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:53,402-Speed 11656.19 samples/sec Loss 7.3568 LearningRate 0.0470 Epoch: 12 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:54,288-Speed 11562.73 samples/sec Loss 7.3805 LearningRate 0.0470 Epoch: 12 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:55,183-Speed 11463.76 samples/sec Loss 7.3738 LearningRate 0.0470 Epoch: 12 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:56,087-Speed 11345.12 samples/sec Loss 7.3755 LearningRate 0.0470 Epoch: 12 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:57,032-Speed 10834.33 samples/sec Loss 7.4634 LearningRate 0.0470 Epoch: 12 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:57,933-Speed 11376.43 samples/sec Loss 7.4168 LearningRate 0.0470 Epoch: 12 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:58,848-Speed 11208.23 samples/sec Loss 7.2951 LearningRate 0.0470 Epoch: 12 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:46:59,767-Speed 11152.39 samples/sec Loss 7.4305 LearningRate 0.0470 Epoch: 12 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:47:00,696-Speed 11029.06 samples/sec Loss 7.2707 LearningRate 0.0470 Epoch: 12 Global Step: 71570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:01,615-Speed 11156.00 samples/sec Loss 7.2918 LearningRate 0.0470 Epoch: 12 Global Step: 71580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:02,545-Speed 11033.40 samples/sec Loss 7.5335 LearningRate 0.0470 Epoch: 12 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:03,460-Speed 11188.76 samples/sec Loss 7.3353 LearningRate 0.0469 Epoch: 12 Global Step: 71600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:04,467-Speed 10178.52 samples/sec Loss 7.4499 LearningRate 0.0469 Epoch: 12 Global Step: 71610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:05,450-Speed 10427.48 samples/sec Loss 7.4431 LearningRate 0.0469 Epoch: 12 Global Step: 71620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:06,426-Speed 10498.45 samples/sec Loss 7.3994 LearningRate 0.0469 Epoch: 12 Global Step: 71630 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:07,384-Speed 10692.54 samples/sec Loss 7.2120 LearningRate 0.0469 Epoch: 12 Global Step: 71640 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:08,310-Speed 11076.97 samples/sec Loss 7.5003 LearningRate 0.0469 Epoch: 12 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:09,219-Speed 11266.05 samples/sec Loss 7.4104 LearningRate 0.0469 Epoch: 12 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:10,112-Speed 11477.32 samples/sec Loss 7.4931 LearningRate 0.0469 Epoch: 12 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:11,061-Speed 10803.60 samples/sec Loss 7.5028 LearningRate 0.0469 Epoch: 12 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:11,964-Speed 11351.37 samples/sec Loss 7.4649 LearningRate 0.0469 Epoch: 12 Global Step: 71690 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:12,941-Speed 10488.83 samples/sec Loss 7.4204 LearningRate 0.0469 Epoch: 12 Global Step: 71700 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:13,836-Speed 11457.04 samples/sec Loss 7.4599 LearningRate 0.0469 Epoch: 12 Global Step: 71710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:47:14,750-Speed 11213.20 samples/sec Loss 7.5415 LearningRate 0.0469 Epoch: 12 Global Step: 71720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:15,632-Speed 11610.88 samples/sec Loss 7.2651 LearningRate 0.0469 Epoch: 12 Global Step: 71730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:16,636-Speed 10211.80 samples/sec Loss 7.5450 LearningRate 0.0469 Epoch: 12 Global Step: 71740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:17,599-Speed 10643.38 samples/sec Loss 7.3584 LearningRate 0.0469 Epoch: 12 Global Step: 71750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:18,513-Speed 11211.43 samples/sec Loss 7.3750 LearningRate 0.0469 Epoch: 12 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:19,414-Speed 11380.42 samples/sec Loss 7.4140 LearningRate 0.0468 Epoch: 12 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:20,352-Speed 10922.28 samples/sec Loss 7.4615 LearningRate 0.0468 Epoch: 12 Global Step: 71780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:21,284-Speed 10997.76 samples/sec Loss 7.4083 LearningRate 0.0468 Epoch: 12 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:22,189-Speed 11328.37 samples/sec Loss 7.3791 LearningRate 0.0468 Epoch: 12 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:23,079-Speed 11511.64 samples/sec Loss 7.3366 LearningRate 0.0468 Epoch: 12 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:24,031-Speed 10770.90 samples/sec Loss 7.5044 LearningRate 0.0468 Epoch: 12 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:24,953-Speed 11110.50 samples/sec Loss 7.2762 LearningRate 0.0468 Epoch: 12 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:25,842-Speed 11539.88 samples/sec Loss 7.3648 LearningRate 0.0468 Epoch: 12 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:26,759-Speed 11167.96 samples/sec Loss 7.4056 LearningRate 0.0468 Epoch: 12 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:27,737-Speed 10476.46 samples/sec Loss 7.4605 LearningRate 0.0468 Epoch: 12 Global Step: 71860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:28,638-Speed 11384.26 samples/sec Loss 7.3802 LearningRate 0.0468 Epoch: 12 Global Step: 71870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:29,565-Speed 11053.13 samples/sec Loss 7.5304 LearningRate 0.0468 Epoch: 12 Global Step: 71880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:30,463-Speed 11416.49 samples/sec Loss 7.3976 LearningRate 0.0468 Epoch: 12 Global Step: 71890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:31,408-Speed 10842.90 samples/sec Loss 7.5382 LearningRate 0.0468 Epoch: 12 Global Step: 71900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:32,324-Speed 11190.81 samples/sec Loss 7.4449 LearningRate 0.0468 Epoch: 12 Global Step: 71910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:47:33,202-Speed 11667.54 samples/sec Loss 7.4974 LearningRate 0.0468 Epoch: 12 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:34,093-Speed 11508.97 samples/sec Loss 7.3663 LearningRate 0.0468 Epoch: 12 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:35,010-Speed 11168.40 samples/sec Loss 7.4174 LearningRate 0.0467 Epoch: 12 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:35,919-Speed 11281.15 samples/sec Loss 7.2797 LearningRate 0.0467 Epoch: 12 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:36,813-Speed 11478.90 samples/sec Loss 7.4285 LearningRate 0.0467 Epoch: 12 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:37,764-Speed 10774.99 samples/sec Loss 7.4135 LearningRate 0.0467 Epoch: 12 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:38,664-Speed 11390.86 samples/sec Loss 7.3672 LearningRate 0.0467 Epoch: 12 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:39,715-Speed 9748.78 samples/sec Loss 7.3688 LearningRate 0.0467 Epoch: 12 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:47:40,631-Speed 11187.09 samples/sec Loss 7.2583 LearningRate 0.0467 Epoch: 12 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:48:02,535-[lfw][72000]XNorm: 12.120635 Training: 2022-04-11 01:48:02,536-[lfw][72000]Accuracy-Flip: 0.99567+-0.00300 Training: 2022-04-11 01:48:02,536-[lfw][72000]Accuracy-Highest: 0.99583 Training: 2022-04-11 01:48:27,167-[cfp_fp][72000]XNorm: 10.148816 Training: 2022-04-11 01:48:27,167-[cfp_fp][72000]Accuracy-Flip: 0.94129+-0.01259 Training: 2022-04-11 01:48:27,168-[cfp_fp][72000]Accuracy-Highest: 0.94143 Training: 2022-04-11 01:48:48,408-[agedb_30][72000]XNorm: 11.889964 Training: 2022-04-11 01:48:48,408-[agedb_30][72000]Accuracy-Flip: 0.95917+-0.01003 Training: 2022-04-11 01:48:48,409-[agedb_30][72000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:48:49,313-Speed 149.10 samples/sec Loss 7.4650 LearningRate 0.0467 Epoch: 12 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:50,265-Speed 10766.87 samples/sec Loss 7.4021 LearningRate 0.0467 Epoch: 12 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:48:51,205-Speed 10898.45 samples/sec Loss 7.3646 LearningRate 0.0467 Epoch: 12 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:52,099-Speed 11464.82 samples/sec Loss 7.3568 LearningRate 0.0467 Epoch: 12 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:53,013-Speed 11231.84 samples/sec Loss 7.3979 LearningRate 0.0467 Epoch: 12 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:53,983-Speed 10563.58 samples/sec Loss 7.4269 LearningRate 0.0467 Epoch: 12 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:54,887-Speed 11334.74 samples/sec Loss 7.4000 LearningRate 0.0467 Epoch: 12 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:55,800-Speed 11232.73 samples/sec Loss 7.3213 LearningRate 0.0467 Epoch: 12 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:56,683-Speed 11603.86 samples/sec Loss 7.4243 LearningRate 0.0467 Epoch: 12 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:57,646-Speed 10643.54 samples/sec Loss 7.5003 LearningRate 0.0466 Epoch: 12 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:58,588-Speed 10885.50 samples/sec Loss 7.3436 LearningRate 0.0466 Epoch: 12 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:48:59,530-Speed 10878.42 samples/sec Loss 7.3141 LearningRate 0.0466 Epoch: 12 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:00,486-Speed 10719.48 samples/sec Loss 7.4116 LearningRate 0.0466 Epoch: 12 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:49:01,440-Speed 10734.78 samples/sec Loss 7.5634 LearningRate 0.0466 Epoch: 12 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:49:02,390-Speed 10789.95 samples/sec Loss 7.4580 LearningRate 0.0466 Epoch: 12 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:49:03,358-Speed 10592.27 samples/sec Loss 7.4809 LearningRate 0.0466 Epoch: 12 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:04,289-Speed 11001.68 samples/sec Loss 7.3438 LearningRate 0.0466 Epoch: 12 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:05,212-Speed 11102.33 samples/sec Loss 7.4504 LearningRate 0.0466 Epoch: 12 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:06,212-Speed 10248.30 samples/sec Loss 7.4409 LearningRate 0.0466 Epoch: 12 Global Step: 72190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:07,148-Speed 10956.74 samples/sec Loss 7.3350 LearningRate 0.0466 Epoch: 12 Global Step: 72200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:08,058-Speed 11260.41 samples/sec Loss 7.4031 LearningRate 0.0466 Epoch: 12 Global Step: 72210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:08,958-Speed 11382.72 samples/sec Loss 7.5496 LearningRate 0.0466 Epoch: 12 Global Step: 72220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:09,902-Speed 10866.50 samples/sec Loss 7.6090 LearningRate 0.0466 Epoch: 12 Global Step: 72230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:10,817-Speed 11193.66 samples/sec Loss 7.2347 LearningRate 0.0466 Epoch: 12 Global Step: 72240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:11,719-Speed 11365.69 samples/sec Loss 7.4977 LearningRate 0.0466 Epoch: 12 Global Step: 72250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:12,642-Speed 11112.96 samples/sec Loss 7.4484 LearningRate 0.0466 Epoch: 12 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:13,621-Speed 10463.83 samples/sec Loss 7.4158 LearningRate 0.0465 Epoch: 12 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:14,566-Speed 10841.48 samples/sec Loss 7.3429 LearningRate 0.0465 Epoch: 12 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:15,462-Speed 11444.14 samples/sec Loss 7.3598 LearningRate 0.0465 Epoch: 12 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:16,416-Speed 10737.56 samples/sec Loss 7.3713 LearningRate 0.0465 Epoch: 12 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:17,354-Speed 10933.15 samples/sec Loss 7.3909 LearningRate 0.0465 Epoch: 12 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:18,307-Speed 10752.40 samples/sec Loss 7.4260 LearningRate 0.0465 Epoch: 12 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:19,229-Speed 11118.43 samples/sec Loss 7.4457 LearningRate 0.0465 Epoch: 12 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:20,129-Speed 11379.48 samples/sec Loss 7.4448 LearningRate 0.0465 Epoch: 12 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:21,065-Speed 10945.12 samples/sec Loss 7.4106 LearningRate 0.0465 Epoch: 12 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:22,046-Speed 10451.64 samples/sec Loss 7.2596 LearningRate 0.0465 Epoch: 12 Global Step: 72360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:49:22,975-Speed 11034.61 samples/sec Loss 7.3337 LearningRate 0.0465 Epoch: 12 Global Step: 72370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:49:23,892-Speed 11179.47 samples/sec Loss 7.4549 LearningRate 0.0465 Epoch: 12 Global Step: 72380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 01:49:24,841-Speed 10799.10 samples/sec Loss 7.4106 LearningRate 0.0465 Epoch: 12 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:25,754-Speed 11227.96 samples/sec Loss 7.4300 LearningRate 0.0465 Epoch: 12 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:26,691-Speed 10931.97 samples/sec Loss 7.3308 LearningRate 0.0465 Epoch: 12 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:27,601-Speed 11271.04 samples/sec Loss 7.3829 LearningRate 0.0465 Epoch: 12 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:28,534-Speed 10976.47 samples/sec Loss 7.3216 LearningRate 0.0465 Epoch: 12 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 01:49:29,509-Speed 10519.02 samples/sec Loss 7.3619 LearningRate 0.0464 Epoch: 12 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:30,486-Speed 10492.60 samples/sec Loss 7.4164 LearningRate 0.0464 Epoch: 12 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:31,433-Speed 10814.87 samples/sec Loss 7.5116 LearningRate 0.0464 Epoch: 12 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:32,334-Speed 11377.47 samples/sec Loss 7.2730 LearningRate 0.0464 Epoch: 12 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:33,290-Speed 10711.18 samples/sec Loss 7.4119 LearningRate 0.0464 Epoch: 12 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:34,172-Speed 11634.10 samples/sec Loss 7.2958 LearningRate 0.0464 Epoch: 12 Global Step: 72490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:35,077-Speed 11313.90 samples/sec Loss 7.3359 LearningRate 0.0464 Epoch: 12 Global Step: 72500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:35,988-Speed 11255.87 samples/sec Loss 7.4582 LearningRate 0.0464 Epoch: 12 Global Step: 72510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:36,904-Speed 11185.68 samples/sec Loss 7.4661 LearningRate 0.0464 Epoch: 12 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:37,851-Speed 10828.29 samples/sec Loss 7.2304 LearningRate 0.0464 Epoch: 12 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:38,761-Speed 11256.72 samples/sec Loss 7.4987 LearningRate 0.0464 Epoch: 12 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:39,708-Speed 10831.08 samples/sec Loss 7.3499 LearningRate 0.0464 Epoch: 12 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:40,597-Speed 11523.73 samples/sec Loss 7.4409 LearningRate 0.0464 Epoch: 12 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:41,546-Speed 10806.76 samples/sec Loss 7.3798 LearningRate 0.0464 Epoch: 12 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:42,523-Speed 10491.66 samples/sec Loss 7.3594 LearningRate 0.0464 Epoch: 12 Global Step: 72580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:43,431-Speed 11289.88 samples/sec Loss 7.4971 LearningRate 0.0464 Epoch: 12 Global Step: 72590 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:49:44,336-Speed 11317.55 samples/sec Loss 7.2997 LearningRate 0.0463 Epoch: 12 Global Step: 72600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:45,281-Speed 10847.78 samples/sec Loss 7.3677 LearningRate 0.0463 Epoch: 12 Global Step: 72610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:46,199-Speed 11168.41 samples/sec Loss 7.4363 LearningRate 0.0463 Epoch: 12 Global Step: 72620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:47,085-Speed 11567.85 samples/sec Loss 7.3918 LearningRate 0.0463 Epoch: 12 Global Step: 72630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:47,996-Speed 11240.43 samples/sec Loss 7.3136 LearningRate 0.0463 Epoch: 12 Global Step: 72640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:48,925-Speed 11031.09 samples/sec Loss 7.4676 LearningRate 0.0463 Epoch: 12 Global Step: 72650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:49,870-Speed 10846.28 samples/sec Loss 7.3921 LearningRate 0.0463 Epoch: 12 Global Step: 72660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:49:50,750-Speed 11642.12 samples/sec Loss 7.3223 LearningRate 0.0463 Epoch: 12 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:51,666-Speed 11198.83 samples/sec Loss 7.2560 LearningRate 0.0463 Epoch: 12 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:52,575-Speed 11272.95 samples/sec Loss 7.4355 LearningRate 0.0463 Epoch: 12 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:53,527-Speed 10761.21 samples/sec Loss 7.4916 LearningRate 0.0463 Epoch: 12 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:54,451-Speed 11092.73 samples/sec Loss 7.3174 LearningRate 0.0463 Epoch: 12 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:55,388-Speed 10942.96 samples/sec Loss 7.2991 LearningRate 0.0463 Epoch: 12 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:56,335-Speed 10819.03 samples/sec Loss 7.3462 LearningRate 0.0463 Epoch: 12 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:57,232-Speed 11429.63 samples/sec Loss 7.3972 LearningRate 0.0463 Epoch: 12 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:58,132-Speed 11388.82 samples/sec Loss 7.3767 LearningRate 0.0463 Epoch: 12 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:59,039-Speed 11291.08 samples/sec Loss 7.5584 LearningRate 0.0463 Epoch: 12 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:49:59,963-Speed 11093.66 samples/sec Loss 7.3041 LearningRate 0.0462 Epoch: 12 Global Step: 72770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:00,955-Speed 10331.50 samples/sec Loss 7.3970 LearningRate 0.0462 Epoch: 12 Global Step: 72780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:01,886-Speed 11006.34 samples/sec Loss 7.5235 LearningRate 0.0462 Epoch: 12 Global Step: 72790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:02,803-Speed 11184.55 samples/sec Loss 7.4202 LearningRate 0.0462 Epoch: 12 Global Step: 72800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:03,697-Speed 11462.29 samples/sec Loss 7.3729 LearningRate 0.0462 Epoch: 12 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:04,650-Speed 10752.00 samples/sec Loss 7.4312 LearningRate 0.0462 Epoch: 12 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:05,602-Speed 10758.44 samples/sec Loss 7.3664 LearningRate 0.0462 Epoch: 12 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:06,481-Speed 11669.81 samples/sec Loss 7.4397 LearningRate 0.0462 Epoch: 12 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:07,381-Speed 11387.59 samples/sec Loss 7.2875 LearningRate 0.0462 Epoch: 12 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:08,341-Speed 10669.15 samples/sec Loss 7.3745 LearningRate 0.0462 Epoch: 12 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:09,246-Speed 11323.22 samples/sec Loss 7.3378 LearningRate 0.0462 Epoch: 12 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:10,139-Speed 11485.21 samples/sec Loss 7.4510 LearningRate 0.0462 Epoch: 12 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:11,020-Speed 11649.50 samples/sec Loss 7.3972 LearningRate 0.0462 Epoch: 12 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:11,979-Speed 10681.47 samples/sec Loss 7.3362 LearningRate 0.0462 Epoch: 12 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:13,020-Speed 9848.76 samples/sec Loss 7.5546 LearningRate 0.0462 Epoch: 12 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:13,952-Speed 11000.62 samples/sec Loss 7.5524 LearningRate 0.0462 Epoch: 12 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:14,896-Speed 10854.40 samples/sec Loss 7.3446 LearningRate 0.0462 Epoch: 12 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:15,839-Speed 10871.95 samples/sec Loss 7.5039 LearningRate 0.0461 Epoch: 12 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:16,811-Speed 10536.88 samples/sec Loss 7.3536 LearningRate 0.0461 Epoch: 12 Global Step: 72950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:17,754-Speed 10870.31 samples/sec Loss 7.4118 LearningRate 0.0461 Epoch: 12 Global Step: 72960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:18,681-Speed 11057.81 samples/sec Loss 7.4701 LearningRate 0.0461 Epoch: 12 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:19,593-Speed 11242.75 samples/sec Loss 7.4020 LearningRate 0.0461 Epoch: 12 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:20,505-Speed 11232.63 samples/sec Loss 7.4885 LearningRate 0.0461 Epoch: 12 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:21,446-Speed 10890.24 samples/sec Loss 7.4136 LearningRate 0.0461 Epoch: 12 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:22,379-Speed 10999.79 samples/sec Loss 7.5461 LearningRate 0.0461 Epoch: 12 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:23,272-Speed 11477.18 samples/sec Loss 7.4773 LearningRate 0.0461 Epoch: 12 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:24,189-Speed 11175.85 samples/sec Loss 7.4603 LearningRate 0.0461 Epoch: 12 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:25,152-Speed 10646.27 samples/sec Loss 7.3831 LearningRate 0.0461 Epoch: 12 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:26,039-Speed 11555.96 samples/sec Loss 7.3464 LearningRate 0.0461 Epoch: 12 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:26,927-Speed 11537.24 samples/sec Loss 7.4437 LearningRate 0.0461 Epoch: 12 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:27,830-Speed 11351.67 samples/sec Loss 7.5764 LearningRate 0.0461 Epoch: 12 Global Step: 73070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:28,785-Speed 10723.00 samples/sec Loss 7.4105 LearningRate 0.0461 Epoch: 12 Global Step: 73080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:29,730-Speed 10853.21 samples/sec Loss 7.4277 LearningRate 0.0461 Epoch: 12 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:30,617-Speed 11554.21 samples/sec Loss 7.4466 LearningRate 0.0460 Epoch: 12 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:31,519-Speed 11358.19 samples/sec Loss 7.3428 LearningRate 0.0460 Epoch: 12 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:32,465-Speed 10839.38 samples/sec Loss 7.5002 LearningRate 0.0460 Epoch: 12 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:33,365-Speed 11388.94 samples/sec Loss 7.3546 LearningRate 0.0460 Epoch: 12 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:34,292-Speed 11050.95 samples/sec Loss 7.4854 LearningRate 0.0460 Epoch: 12 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:35,219-Speed 11059.32 samples/sec Loss 7.4611 LearningRate 0.0460 Epoch: 12 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:36,161-Speed 10877.02 samples/sec Loss 7.3777 LearningRate 0.0460 Epoch: 12 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:37,145-Speed 10420.75 samples/sec Loss 7.3173 LearningRate 0.0460 Epoch: 12 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:38,115-Speed 10558.50 samples/sec Loss 7.3791 LearningRate 0.0460 Epoch: 12 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:39,056-Speed 10899.47 samples/sec Loss 7.3756 LearningRate 0.0460 Epoch: 12 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:39,988-Speed 10989.83 samples/sec Loss 7.4915 LearningRate 0.0460 Epoch: 12 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:40,948-Speed 10680.75 samples/sec Loss 7.4894 LearningRate 0.0460 Epoch: 12 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:41,881-Speed 10982.19 samples/sec Loss 7.2577 LearningRate 0.0460 Epoch: 12 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:42,932-Speed 9753.38 samples/sec Loss 7.3897 LearningRate 0.0460 Epoch: 12 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:43,849-Speed 11179.64 samples/sec Loss 7.2499 LearningRate 0.0460 Epoch: 12 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:44,746-Speed 11433.03 samples/sec Loss 7.3497 LearningRate 0.0460 Epoch: 12 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:45,653-Speed 11291.05 samples/sec Loss 7.4257 LearningRate 0.0460 Epoch: 12 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:46,582-Speed 11030.51 samples/sec Loss 7.4129 LearningRate 0.0459 Epoch: 12 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:47,502-Speed 11147.27 samples/sec Loss 7.4520 LearningRate 0.0459 Epoch: 12 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:48,411-Speed 11270.67 samples/sec Loss 7.5224 LearningRate 0.0459 Epoch: 12 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:49,339-Speed 11048.90 samples/sec Loss 7.4403 LearningRate 0.0459 Epoch: 12 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:50,246-Speed 11292.65 samples/sec Loss 7.4137 LearningRate 0.0459 Epoch: 12 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:51,181-Speed 10958.46 samples/sec Loss 7.4246 LearningRate 0.0459 Epoch: 12 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:52,087-Speed 11312.01 samples/sec Loss 7.3772 LearningRate 0.0459 Epoch: 12 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:52,981-Speed 11464.78 samples/sec Loss 7.5598 LearningRate 0.0459 Epoch: 12 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:53,927-Speed 10833.87 samples/sec Loss 7.3411 LearningRate 0.0459 Epoch: 12 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:54,855-Speed 11042.65 samples/sec Loss 7.3605 LearningRate 0.0459 Epoch: 12 Global Step: 73360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:55,777-Speed 11113.33 samples/sec Loss 7.3516 LearningRate 0.0459 Epoch: 12 Global Step: 73370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:50:56,729-Speed 10771.19 samples/sec Loss 7.4207 LearningRate 0.0459 Epoch: 12 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:57,612-Speed 11602.25 samples/sec Loss 7.4084 LearningRate 0.0459 Epoch: 12 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:58,552-Speed 10899.81 samples/sec Loss 7.3880 LearningRate 0.0459 Epoch: 12 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:50:59,508-Speed 10726.01 samples/sec Loss 7.3674 LearningRate 0.0459 Epoch: 12 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:00,405-Speed 11429.13 samples/sec Loss 7.2848 LearningRate 0.0459 Epoch: 12 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:01,353-Speed 10810.62 samples/sec Loss 7.3166 LearningRate 0.0459 Epoch: 12 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:02,322-Speed 10580.64 samples/sec Loss 7.3655 LearningRate 0.0458 Epoch: 12 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:03,277-Speed 10724.16 samples/sec Loss 7.3141 LearningRate 0.0458 Epoch: 12 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:04,219-Speed 10887.00 samples/sec Loss 7.3088 LearningRate 0.0458 Epoch: 12 Global Step: 73460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:05,185-Speed 10608.81 samples/sec Loss 7.3560 LearningRate 0.0458 Epoch: 12 Global Step: 73470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:06,102-Speed 11174.30 samples/sec Loss 7.2585 LearningRate 0.0458 Epoch: 12 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:07,021-Speed 11155.50 samples/sec Loss 7.3105 LearningRate 0.0458 Epoch: 12 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:07,942-Speed 11117.32 samples/sec Loss 7.4587 LearningRate 0.0458 Epoch: 12 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:08,921-Speed 10473.84 samples/sec Loss 7.1649 LearningRate 0.0458 Epoch: 12 Global Step: 73510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:09,846-Speed 11077.81 samples/sec Loss 7.4530 LearningRate 0.0458 Epoch: 12 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:10,754-Speed 11291.19 samples/sec Loss 7.2450 LearningRate 0.0458 Epoch: 12 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:11,648-Speed 11463.69 samples/sec Loss 7.3888 LearningRate 0.0458 Epoch: 12 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:12,593-Speed 10844.47 samples/sec Loss 7.4072 LearningRate 0.0458 Epoch: 12 Global Step: 73550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:13,647-Speed 9725.68 samples/sec Loss 7.2856 LearningRate 0.0458 Epoch: 12 Global Step: 73560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:14,535-Speed 11536.21 samples/sec Loss 7.4111 LearningRate 0.0458 Epoch: 12 Global Step: 73570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:15,438-Speed 11355.89 samples/sec Loss 7.3161 LearningRate 0.0458 Epoch: 12 Global Step: 73580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:16,340-Speed 11355.75 samples/sec Loss 7.5373 LearningRate 0.0458 Epoch: 12 Global Step: 73590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:17,253-Speed 11225.18 samples/sec Loss 7.2510 LearningRate 0.0458 Epoch: 12 Global Step: 73600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:18,231-Speed 10477.86 samples/sec Loss 7.2590 LearningRate 0.0457 Epoch: 12 Global Step: 73610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:19,166-Speed 10963.66 samples/sec Loss 7.3047 LearningRate 0.0457 Epoch: 12 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:20,046-Speed 11647.51 samples/sec Loss 7.2808 LearningRate 0.0457 Epoch: 12 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:20,938-Speed 11496.42 samples/sec Loss 7.3696 LearningRate 0.0457 Epoch: 12 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:21,852-Speed 11203.57 samples/sec Loss 7.4495 LearningRate 0.0457 Epoch: 12 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:22,785-Speed 10989.77 samples/sec Loss 7.2120 LearningRate 0.0457 Epoch: 12 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:23,677-Speed 11492.84 samples/sec Loss 7.3478 LearningRate 0.0457 Epoch: 12 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:24,567-Speed 11511.65 samples/sec Loss 7.4186 LearningRate 0.0457 Epoch: 12 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:25,490-Speed 11102.89 samples/sec Loss 7.2457 LearningRate 0.0457 Epoch: 12 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:26,467-Speed 10487.49 samples/sec Loss 7.2331 LearningRate 0.0457 Epoch: 12 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:27,398-Speed 11013.86 samples/sec Loss 7.2591 LearningRate 0.0457 Epoch: 12 Global Step: 73710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:28,285-Speed 11542.03 samples/sec Loss 7.4011 LearningRate 0.0457 Epoch: 12 Global Step: 73720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:29,192-Speed 11313.33 samples/sec Loss 7.3170 LearningRate 0.0457 Epoch: 12 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:30,086-Speed 11453.41 samples/sec Loss 7.3944 LearningRate 0.0457 Epoch: 12 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:31,003-Speed 11171.52 samples/sec Loss 7.3213 LearningRate 0.0457 Epoch: 12 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:51:31,950-Speed 10825.51 samples/sec Loss 7.4160 LearningRate 0.0457 Epoch: 12 Global Step: 73760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:32,871-Speed 11135.64 samples/sec Loss 7.5069 LearningRate 0.0457 Epoch: 12 Global Step: 73770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:33,756-Speed 11576.68 samples/sec Loss 7.4184 LearningRate 0.0456 Epoch: 12 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:34,642-Speed 11566.51 samples/sec Loss 7.2784 LearningRate 0.0456 Epoch: 12 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:35,626-Speed 10417.43 samples/sec Loss 7.2730 LearningRate 0.0456 Epoch: 12 Global Step: 73800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:36,635-Speed 10150.04 samples/sec Loss 7.3060 LearningRate 0.0456 Epoch: 12 Global Step: 73810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:37,567-Speed 10994.20 samples/sec Loss 7.4636 LearningRate 0.0456 Epoch: 12 Global Step: 73820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:38,503-Speed 10950.96 samples/sec Loss 7.3720 LearningRate 0.0456 Epoch: 12 Global Step: 73830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:39,442-Speed 10915.32 samples/sec Loss 7.1688 LearningRate 0.0456 Epoch: 12 Global Step: 73840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:40,361-Speed 11146.96 samples/sec Loss 7.3690 LearningRate 0.0456 Epoch: 12 Global Step: 73850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:41,297-Speed 10953.61 samples/sec Loss 7.4155 LearningRate 0.0456 Epoch: 12 Global Step: 73860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 01:51:42,206-Speed 11279.48 samples/sec Loss 7.3430 LearningRate 0.0456 Epoch: 12 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:43,139-Speed 10983.59 samples/sec Loss 7.4786 LearningRate 0.0456 Epoch: 12 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:44,047-Speed 11285.39 samples/sec Loss 7.2719 LearningRate 0.0456 Epoch: 12 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:44,991-Speed 10851.50 samples/sec Loss 7.1857 LearningRate 0.0456 Epoch: 12 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:46,057-Speed 9618.27 samples/sec Loss 7.2478 LearningRate 0.0456 Epoch: 12 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:57,320-Speed 909.24 samples/sec Loss 7.0722 LearningRate 0.0456 Epoch: 13 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:58,703-Speed 7409.76 samples/sec Loss 6.6000 LearningRate 0.0456 Epoch: 13 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:51:59,614-Speed 11255.08 samples/sec Loss 6.5612 LearningRate 0.0455 Epoch: 13 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:52:00,725-Speed 9228.78 samples/sec Loss 6.5788 LearningRate 0.0455 Epoch: 13 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:52:01,654-Speed 11025.25 samples/sec Loss 6.6689 LearningRate 0.0455 Epoch: 13 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:52:02,698-Speed 9812.52 samples/sec Loss 6.7307 LearningRate 0.0455 Epoch: 13 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:52:03,771-Speed 9559.50 samples/sec Loss 6.5955 LearningRate 0.0455 Epoch: 13 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:52:04,682-Speed 11247.46 samples/sec Loss 6.5991 LearningRate 0.0455 Epoch: 13 Global Step: 73990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:52:05,609-Speed 11064.20 samples/sec Loss 6.7246 LearningRate 0.0455 Epoch: 13 Global Step: 74000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:52:27,333-[lfw][74000]XNorm: 12.201466 Training: 2022-04-11 01:52:27,334-[lfw][74000]Accuracy-Flip: 0.99400+-0.00327 Training: 2022-04-11 01:52:27,334-[lfw][74000]Accuracy-Highest: 0.99583 Training: 2022-04-11 01:52:51,956-[cfp_fp][74000]XNorm: 10.159296 Training: 2022-04-11 01:52:51,957-[cfp_fp][74000]Accuracy-Flip: 0.94271+-0.00964 Training: 2022-04-11 01:52:51,957-[cfp_fp][74000]Accuracy-Highest: 0.94271 Training: 2022-04-11 01:53:13,200-[agedb_30][74000]XNorm: 11.974242 Training: 2022-04-11 01:53:13,201-[agedb_30][74000]Accuracy-Flip: 0.95917+-0.01143 Training: 2022-04-11 01:53:13,201-[agedb_30][74000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:53:14,132-Speed 149.44 samples/sec Loss 6.6051 LearningRate 0.0455 Epoch: 13 Global Step: 74010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:15,074-Speed 10879.97 samples/sec Loss 6.6900 LearningRate 0.0455 Epoch: 13 Global Step: 74020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:16,030-Speed 10721.75 samples/sec Loss 6.6866 LearningRate 0.0455 Epoch: 13 Global Step: 74030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:16,988-Speed 10696.78 samples/sec Loss 6.7247 LearningRate 0.0455 Epoch: 13 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:17,918-Speed 11024.23 samples/sec Loss 6.6719 LearningRate 0.0455 Epoch: 13 Global Step: 74050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:18,879-Speed 10658.03 samples/sec Loss 6.7150 LearningRate 0.0455 Epoch: 13 Global Step: 74060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:19,821-Speed 10881.27 samples/sec Loss 6.7159 LearningRate 0.0455 Epoch: 13 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:20,758-Speed 10941.03 samples/sec Loss 6.7723 LearningRate 0.0455 Epoch: 13 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:21,693-Speed 10962.02 samples/sec Loss 6.7267 LearningRate 0.0455 Epoch: 13 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:22,657-Speed 10631.57 samples/sec Loss 6.8719 LearningRate 0.0455 Epoch: 13 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:23,622-Speed 10612.15 samples/sec Loss 6.7069 LearningRate 0.0454 Epoch: 13 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:24,540-Speed 11167.45 samples/sec Loss 6.7632 LearningRate 0.0454 Epoch: 13 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:25,494-Speed 10743.04 samples/sec Loss 6.8302 LearningRate 0.0454 Epoch: 13 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:26,423-Speed 11033.11 samples/sec Loss 6.6904 LearningRate 0.0454 Epoch: 13 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:27,373-Speed 10781.85 samples/sec Loss 6.8311 LearningRate 0.0454 Epoch: 13 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:28,379-Speed 10186.98 samples/sec Loss 6.7814 LearningRate 0.0454 Epoch: 13 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:29,289-Speed 11270.31 samples/sec Loss 6.9046 LearningRate 0.0454 Epoch: 13 Global Step: 74170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:30,223-Speed 10967.52 samples/sec Loss 6.8113 LearningRate 0.0454 Epoch: 13 Global Step: 74180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:31,193-Speed 10565.58 samples/sec Loss 6.7800 LearningRate 0.0454 Epoch: 13 Global Step: 74190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:32,353-Speed 8831.93 samples/sec Loss 6.8792 LearningRate 0.0454 Epoch: 13 Global Step: 74200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:33,370-Speed 10083.05 samples/sec Loss 6.8233 LearningRate 0.0454 Epoch: 13 Global Step: 74210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:34,320-Speed 10793.26 samples/sec Loss 6.7616 LearningRate 0.0454 Epoch: 13 Global Step: 74220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:35,256-Speed 10945.51 samples/sec Loss 6.7655 LearningRate 0.0454 Epoch: 13 Global Step: 74230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:36,205-Speed 10805.90 samples/sec Loss 6.8253 LearningRate 0.0454 Epoch: 13 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:37,123-Speed 11151.82 samples/sec Loss 6.8629 LearningRate 0.0454 Epoch: 13 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:38,044-Speed 11131.75 samples/sec Loss 6.7662 LearningRate 0.0454 Epoch: 13 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:38,933-Speed 11529.34 samples/sec Loss 6.7984 LearningRate 0.0454 Epoch: 13 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:39,872-Speed 10913.34 samples/sec Loss 6.9306 LearningRate 0.0453 Epoch: 13 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:40,844-Speed 10550.60 samples/sec Loss 6.9190 LearningRate 0.0453 Epoch: 13 Global Step: 74290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:41,752-Speed 11278.78 samples/sec Loss 6.9832 LearningRate 0.0453 Epoch: 13 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:42,687-Speed 10957.38 samples/sec Loss 6.8252 LearningRate 0.0453 Epoch: 13 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:43,676-Speed 10362.60 samples/sec Loss 6.9980 LearningRate 0.0453 Epoch: 13 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:44,605-Speed 11042.37 samples/sec Loss 6.9278 LearningRate 0.0453 Epoch: 13 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:45,530-Speed 11079.10 samples/sec Loss 6.8737 LearningRate 0.0453 Epoch: 13 Global Step: 74340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:46,447-Speed 11174.21 samples/sec Loss 6.9754 LearningRate 0.0453 Epoch: 13 Global Step: 74350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:47,408-Speed 10658.26 samples/sec Loss 6.8433 LearningRate 0.0453 Epoch: 13 Global Step: 74360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:48,324-Speed 11184.29 samples/sec Loss 6.9590 LearningRate 0.0453 Epoch: 13 Global Step: 74370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:49,239-Speed 11207.25 samples/sec Loss 6.8915 LearningRate 0.0453 Epoch: 13 Global Step: 74380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:50,160-Speed 11114.35 samples/sec Loss 6.9753 LearningRate 0.0453 Epoch: 13 Global Step: 74390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:53:51,104-Speed 10865.16 samples/sec Loss 6.9301 LearningRate 0.0453 Epoch: 13 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:52,060-Speed 10715.62 samples/sec Loss 6.9850 LearningRate 0.0453 Epoch: 13 Global Step: 74410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:53,018-Speed 10700.63 samples/sec Loss 6.8882 LearningRate 0.0453 Epoch: 13 Global Step: 74420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:53,983-Speed 10620.02 samples/sec Loss 6.8075 LearningRate 0.0453 Epoch: 13 Global Step: 74430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:54,904-Speed 11124.63 samples/sec Loss 6.9788 LearningRate 0.0453 Epoch: 13 Global Step: 74440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:55,820-Speed 11183.86 samples/sec Loss 6.9348 LearningRate 0.0452 Epoch: 13 Global Step: 74450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:56,749-Speed 11032.62 samples/sec Loss 6.9397 LearningRate 0.0452 Epoch: 13 Global Step: 74460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:57,692-Speed 10869.00 samples/sec Loss 6.9487 LearningRate 0.0452 Epoch: 13 Global Step: 74470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:58,677-Speed 10397.65 samples/sec Loss 6.9044 LearningRate 0.0452 Epoch: 13 Global Step: 74480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:53:59,619-Speed 10878.06 samples/sec Loss 7.1183 LearningRate 0.0452 Epoch: 13 Global Step: 74490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:00,550-Speed 11012.05 samples/sec Loss 6.8902 LearningRate 0.0452 Epoch: 13 Global Step: 74500 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:54:01,456-Speed 11313.05 samples/sec Loss 6.9234 LearningRate 0.0452 Epoch: 13 Global Step: 74510 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:54:02,356-Speed 11385.24 samples/sec Loss 6.8647 LearningRate 0.0452 Epoch: 13 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:03,261-Speed 11327.25 samples/sec Loss 6.9986 LearningRate 0.0452 Epoch: 13 Global Step: 74530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:04,202-Speed 10884.05 samples/sec Loss 7.0337 LearningRate 0.0452 Epoch: 13 Global Step: 74540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:05,116-Speed 11217.63 samples/sec Loss 6.9308 LearningRate 0.0452 Epoch: 13 Global Step: 74550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:06,004-Speed 11530.93 samples/sec Loss 7.0069 LearningRate 0.0452 Epoch: 13 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:06,946-Speed 10878.12 samples/sec Loss 6.9474 LearningRate 0.0452 Epoch: 13 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:07,910-Speed 10633.31 samples/sec Loss 6.9873 LearningRate 0.0452 Epoch: 13 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:08,838-Speed 11048.17 samples/sec Loss 7.0464 LearningRate 0.0452 Epoch: 13 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:09,743-Speed 11322.09 samples/sec Loss 7.0402 LearningRate 0.0452 Epoch: 13 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:10,683-Speed 10906.35 samples/sec Loss 7.0996 LearningRate 0.0452 Epoch: 13 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:11,631-Speed 10799.31 samples/sec Loss 7.0220 LearningRate 0.0451 Epoch: 13 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:12,584-Speed 10756.94 samples/sec Loss 7.0939 LearningRate 0.0451 Epoch: 13 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:13,575-Speed 10337.33 samples/sec Loss 7.0914 LearningRate 0.0451 Epoch: 13 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:14,505-Speed 11033.60 samples/sec Loss 6.9729 LearningRate 0.0451 Epoch: 13 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:15,454-Speed 10790.48 samples/sec Loss 7.0410 LearningRate 0.0451 Epoch: 13 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:16,368-Speed 11216.70 samples/sec Loss 6.9982 LearningRate 0.0451 Epoch: 13 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:17,346-Speed 10477.23 samples/sec Loss 6.8061 LearningRate 0.0451 Epoch: 13 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:18,297-Speed 10782.11 samples/sec Loss 7.0707 LearningRate 0.0451 Epoch: 13 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:19,242-Speed 10842.56 samples/sec Loss 6.9565 LearningRate 0.0451 Epoch: 13 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:20,192-Speed 10788.91 samples/sec Loss 7.0544 LearningRate 0.0451 Epoch: 13 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:21,086-Speed 11467.44 samples/sec Loss 7.1160 LearningRate 0.0451 Epoch: 13 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:22,020-Speed 10974.25 samples/sec Loss 7.0711 LearningRate 0.0451 Epoch: 13 Global Step: 74730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:22,955-Speed 10960.49 samples/sec Loss 7.1088 LearningRate 0.0451 Epoch: 13 Global Step: 74740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:23,881-Speed 11065.48 samples/sec Loss 7.1009 LearningRate 0.0451 Epoch: 13 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:24,777-Speed 11431.48 samples/sec Loss 7.1104 LearningRate 0.0451 Epoch: 13 Global Step: 74760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:25,731-Speed 10741.71 samples/sec Loss 7.0169 LearningRate 0.0451 Epoch: 13 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:26,675-Speed 10856.78 samples/sec Loss 7.0799 LearningRate 0.0451 Epoch: 13 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:27,589-Speed 11221.06 samples/sec Loss 7.1611 LearningRate 0.0450 Epoch: 13 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:28,520-Speed 11006.27 samples/sec Loss 6.9648 LearningRate 0.0450 Epoch: 13 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:29,460-Speed 10905.79 samples/sec Loss 7.0859 LearningRate 0.0450 Epoch: 13 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:30,407-Speed 10821.17 samples/sec Loss 7.0126 LearningRate 0.0450 Epoch: 13 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:31,351-Speed 10864.64 samples/sec Loss 7.1212 LearningRate 0.0450 Epoch: 13 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:32,300-Speed 10800.12 samples/sec Loss 7.0226 LearningRate 0.0450 Epoch: 13 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:33,193-Speed 11476.19 samples/sec Loss 7.1777 LearningRate 0.0450 Epoch: 13 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:34,111-Speed 11154.73 samples/sec Loss 6.8527 LearningRate 0.0450 Epoch: 13 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:35,079-Speed 10582.79 samples/sec Loss 7.0500 LearningRate 0.0450 Epoch: 13 Global Step: 74870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:36,006-Speed 11060.50 samples/sec Loss 7.0323 LearningRate 0.0450 Epoch: 13 Global Step: 74880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:36,961-Speed 10731.52 samples/sec Loss 7.1065 LearningRate 0.0450 Epoch: 13 Global Step: 74890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:37,869-Speed 11287.43 samples/sec Loss 7.2294 LearningRate 0.0450 Epoch: 13 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:38,810-Speed 10889.05 samples/sec Loss 6.9761 LearningRate 0.0450 Epoch: 13 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:39,785-Speed 10511.58 samples/sec Loss 7.1754 LearningRate 0.0450 Epoch: 13 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:40,693-Speed 11292.21 samples/sec Loss 7.0576 LearningRate 0.0450 Epoch: 13 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:41,593-Speed 11384.60 samples/sec Loss 7.0741 LearningRate 0.0450 Epoch: 13 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:42,525-Speed 10991.20 samples/sec Loss 7.1473 LearningRate 0.0450 Epoch: 13 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:43,507-Speed 10440.71 samples/sec Loss 7.1100 LearningRate 0.0449 Epoch: 13 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:44,460-Speed 10748.74 samples/sec Loss 6.9200 LearningRate 0.0449 Epoch: 13 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:45,356-Speed 11447.38 samples/sec Loss 7.0478 LearningRate 0.0449 Epoch: 13 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:46,297-Speed 10891.50 samples/sec Loss 7.1473 LearningRate 0.0449 Epoch: 13 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:47,210-Speed 11212.69 samples/sec Loss 7.1716 LearningRate 0.0449 Epoch: 13 Global Step: 75000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:48,172-Speed 10662.12 samples/sec Loss 7.1705 LearningRate 0.0449 Epoch: 13 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:49,116-Speed 10849.66 samples/sec Loss 6.9650 LearningRate 0.0449 Epoch: 13 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:50,021-Speed 11322.12 samples/sec Loss 6.9934 LearningRate 0.0449 Epoch: 13 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:50,961-Speed 10911.41 samples/sec Loss 7.1474 LearningRate 0.0449 Epoch: 13 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:51,948-Speed 10377.63 samples/sec Loss 7.1351 LearningRate 0.0449 Epoch: 13 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:52,934-Speed 10390.04 samples/sec Loss 7.1109 LearningRate 0.0449 Epoch: 13 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:53,874-Speed 10904.74 samples/sec Loss 7.3074 LearningRate 0.0449 Epoch: 13 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:54,796-Speed 11112.84 samples/sec Loss 7.0918 LearningRate 0.0449 Epoch: 13 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:55,723-Speed 11054.15 samples/sec Loss 7.0170 LearningRate 0.0449 Epoch: 13 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:56,660-Speed 10940.11 samples/sec Loss 7.1179 LearningRate 0.0449 Epoch: 13 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:57,612-Speed 10761.22 samples/sec Loss 7.1087 LearningRate 0.0449 Epoch: 13 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:54:58,540-Speed 11047.92 samples/sec Loss 7.1550 LearningRate 0.0449 Epoch: 13 Global Step: 75120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:54:59,479-Speed 10916.17 samples/sec Loss 6.9321 LearningRate 0.0448 Epoch: 13 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:00,418-Speed 10908.92 samples/sec Loss 7.1372 LearningRate 0.0448 Epoch: 13 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:01,386-Speed 10585.87 samples/sec Loss 7.2093 LearningRate 0.0448 Epoch: 13 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:02,367-Speed 10444.61 samples/sec Loss 7.1302 LearningRate 0.0448 Epoch: 13 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:03,411-Speed 9819.66 samples/sec Loss 7.2332 LearningRate 0.0448 Epoch: 13 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:04,368-Speed 10709.00 samples/sec Loss 7.2432 LearningRate 0.0448 Epoch: 13 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:05,313-Speed 10843.24 samples/sec Loss 7.2262 LearningRate 0.0448 Epoch: 13 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:06,226-Speed 11227.09 samples/sec Loss 7.0931 LearningRate 0.0448 Epoch: 13 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:07,178-Speed 10756.96 samples/sec Loss 7.1639 LearningRate 0.0448 Epoch: 13 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:08,071-Speed 11516.23 samples/sec Loss 6.9514 LearningRate 0.0448 Epoch: 13 Global Step: 75220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:08,973-Speed 11369.74 samples/sec Loss 7.0899 LearningRate 0.0448 Epoch: 13 Global Step: 75230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:09,860-Speed 11547.57 samples/sec Loss 7.0726 LearningRate 0.0448 Epoch: 13 Global Step: 75240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:10,745-Speed 11582.03 samples/sec Loss 7.1964 LearningRate 0.0448 Epoch: 13 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:11,717-Speed 10541.66 samples/sec Loss 7.1524 LearningRate 0.0448 Epoch: 13 Global Step: 75260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:12,652-Speed 10966.91 samples/sec Loss 7.0727 LearningRate 0.0448 Epoch: 13 Global Step: 75270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:13,576-Speed 11087.10 samples/sec Loss 7.1514 LearningRate 0.0448 Epoch: 13 Global Step: 75280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:14,486-Speed 11264.69 samples/sec Loss 7.0086 LearningRate 0.0448 Epoch: 13 Global Step: 75290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:15,395-Speed 11265.31 samples/sec Loss 7.1558 LearningRate 0.0447 Epoch: 13 Global Step: 75300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:16,347-Speed 10773.11 samples/sec Loss 7.1776 LearningRate 0.0447 Epoch: 13 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:17,248-Speed 11377.89 samples/sec Loss 7.2167 LearningRate 0.0447 Epoch: 13 Global Step: 75320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:18,195-Speed 10827.28 samples/sec Loss 7.0735 LearningRate 0.0447 Epoch: 13 Global Step: 75330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:19,115-Speed 11132.86 samples/sec Loss 7.3143 LearningRate 0.0447 Epoch: 13 Global Step: 75340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:20,017-Speed 11363.93 samples/sec Loss 7.1214 LearningRate 0.0447 Epoch: 13 Global Step: 75350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:20,934-Speed 11181.23 samples/sec Loss 7.0380 LearningRate 0.0447 Epoch: 13 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:21,823-Speed 11523.97 samples/sec Loss 7.1642 LearningRate 0.0447 Epoch: 13 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:22,708-Speed 11584.99 samples/sec Loss 6.9942 LearningRate 0.0447 Epoch: 13 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:23,633-Speed 11071.29 samples/sec Loss 7.1529 LearningRate 0.0447 Epoch: 13 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:24,569-Speed 10956.65 samples/sec Loss 7.1374 LearningRate 0.0447 Epoch: 13 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:25,517-Speed 10802.89 samples/sec Loss 7.0422 LearningRate 0.0447 Epoch: 13 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:26,442-Speed 11084.50 samples/sec Loss 7.1060 LearningRate 0.0447 Epoch: 13 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:27,349-Speed 11296.35 samples/sec Loss 7.1116 LearningRate 0.0447 Epoch: 13 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:28,287-Speed 10920.09 samples/sec Loss 7.1481 LearningRate 0.0447 Epoch: 13 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:29,223-Speed 10945.43 samples/sec Loss 7.1005 LearningRate 0.0447 Epoch: 13 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:30,153-Speed 11031.31 samples/sec Loss 7.1408 LearningRate 0.0447 Epoch: 13 Global Step: 75460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:31,056-Speed 11347.77 samples/sec Loss 7.1520 LearningRate 0.0446 Epoch: 13 Global Step: 75470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:55:31,995-Speed 10908.36 samples/sec Loss 7.2143 LearningRate 0.0446 Epoch: 13 Global Step: 75480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:32,906-Speed 11259.61 samples/sec Loss 7.0049 LearningRate 0.0446 Epoch: 13 Global Step: 75490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:33,848-Speed 10869.31 samples/sec Loss 7.0935 LearningRate 0.0446 Epoch: 13 Global Step: 75500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:34,769-Speed 11131.95 samples/sec Loss 7.2422 LearningRate 0.0446 Epoch: 13 Global Step: 75510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:35,707-Speed 10926.63 samples/sec Loss 7.1964 LearningRate 0.0446 Epoch: 13 Global Step: 75520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:36,634-Speed 11053.59 samples/sec Loss 7.3603 LearningRate 0.0446 Epoch: 13 Global Step: 75530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:37,554-Speed 11141.34 samples/sec Loss 7.0427 LearningRate 0.0446 Epoch: 13 Global Step: 75540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:38,515-Speed 10671.25 samples/sec Loss 7.2198 LearningRate 0.0446 Epoch: 13 Global Step: 75550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:39,423-Speed 11281.61 samples/sec Loss 7.0739 LearningRate 0.0446 Epoch: 13 Global Step: 75560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:40,322-Speed 11401.54 samples/sec Loss 7.1532 LearningRate 0.0446 Epoch: 13 Global Step: 75570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:41,210-Speed 11532.37 samples/sec Loss 7.1802 LearningRate 0.0446 Epoch: 13 Global Step: 75580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:42,173-Speed 10647.39 samples/sec Loss 7.2280 LearningRate 0.0446 Epoch: 13 Global Step: 75590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:43,143-Speed 10573.43 samples/sec Loss 7.2036 LearningRate 0.0446 Epoch: 13 Global Step: 75600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:44,081-Speed 10915.57 samples/sec Loss 7.3147 LearningRate 0.0446 Epoch: 13 Global Step: 75610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:45,007-Speed 11070.71 samples/sec Loss 7.2122 LearningRate 0.0446 Epoch: 13 Global Step: 75620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:45,925-Speed 11156.02 samples/sec Loss 7.0640 LearningRate 0.0446 Epoch: 13 Global Step: 75630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:46,865-Speed 10900.72 samples/sec Loss 7.1179 LearningRate 0.0445 Epoch: 13 Global Step: 75640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:47,791-Speed 11074.05 samples/sec Loss 7.2552 LearningRate 0.0445 Epoch: 13 Global Step: 75650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:48,708-Speed 11173.01 samples/sec Loss 7.1887 LearningRate 0.0445 Epoch: 13 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:49,648-Speed 10908.18 samples/sec Loss 7.2088 LearningRate 0.0445 Epoch: 13 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:50,609-Speed 10659.76 samples/sec Loss 7.1119 LearningRate 0.0445 Epoch: 13 Global Step: 75680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:51,560-Speed 10777.57 samples/sec Loss 7.0067 LearningRate 0.0445 Epoch: 13 Global Step: 75690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:52,558-Speed 10265.53 samples/sec Loss 7.2543 LearningRate 0.0445 Epoch: 13 Global Step: 75700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:53,468-Speed 11264.11 samples/sec Loss 7.1226 LearningRate 0.0445 Epoch: 13 Global Step: 75710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:54,376-Speed 11293.48 samples/sec Loss 7.2539 LearningRate 0.0445 Epoch: 13 Global Step: 75720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:55,275-Speed 11394.22 samples/sec Loss 7.2649 LearningRate 0.0445 Epoch: 13 Global Step: 75730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:56,244-Speed 10573.08 samples/sec Loss 7.2613 LearningRate 0.0445 Epoch: 13 Global Step: 75740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:57,174-Speed 11026.20 samples/sec Loss 7.2526 LearningRate 0.0445 Epoch: 13 Global Step: 75750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:58,068-Speed 11464.17 samples/sec Loss 7.2525 LearningRate 0.0445 Epoch: 13 Global Step: 75760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:58,978-Speed 11254.26 samples/sec Loss 7.2228 LearningRate 0.0445 Epoch: 13 Global Step: 75770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:55:59,940-Speed 10656.83 samples/sec Loss 7.3208 LearningRate 0.0445 Epoch: 13 Global Step: 75780 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:56:00,933-Speed 10316.93 samples/sec Loss 7.2375 LearningRate 0.0445 Epoch: 13 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:01,868-Speed 10971.46 samples/sec Loss 7.2409 LearningRate 0.0445 Epoch: 13 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:02,799-Speed 11000.08 samples/sec Loss 7.2357 LearningRate 0.0444 Epoch: 13 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:03,755-Speed 10728.34 samples/sec Loss 7.3067 LearningRate 0.0444 Epoch: 13 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:04,774-Speed 10062.83 samples/sec Loss 7.2347 LearningRate 0.0444 Epoch: 13 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:05,721-Speed 10817.25 samples/sec Loss 7.2476 LearningRate 0.0444 Epoch: 13 Global Step: 75840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:06,691-Speed 10563.03 samples/sec Loss 7.3079 LearningRate 0.0444 Epoch: 13 Global Step: 75850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:07,691-Speed 10249.40 samples/sec Loss 7.2355 LearningRate 0.0444 Epoch: 13 Global Step: 75860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:08,607-Speed 11196.73 samples/sec Loss 7.1053 LearningRate 0.0444 Epoch: 13 Global Step: 75870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:09,555-Speed 10804.34 samples/sec Loss 7.1528 LearningRate 0.0444 Epoch: 13 Global Step: 75880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:10,457-Speed 11363.77 samples/sec Loss 7.0733 LearningRate 0.0444 Epoch: 13 Global Step: 75890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:11,362-Speed 11324.98 samples/sec Loss 7.2271 LearningRate 0.0444 Epoch: 13 Global Step: 75900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:12,245-Speed 11601.75 samples/sec Loss 7.2203 LearningRate 0.0444 Epoch: 13 Global Step: 75910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:13,165-Speed 11146.27 samples/sec Loss 7.2338 LearningRate 0.0444 Epoch: 13 Global Step: 75920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:14,125-Speed 10675.04 samples/sec Loss 7.3534 LearningRate 0.0444 Epoch: 13 Global Step: 75930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:15,035-Speed 11266.75 samples/sec Loss 7.2225 LearningRate 0.0444 Epoch: 13 Global Step: 75940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:16,022-Speed 10377.03 samples/sec Loss 7.0518 LearningRate 0.0444 Epoch: 13 Global Step: 75950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:16,976-Speed 10742.01 samples/sec Loss 7.2271 LearningRate 0.0444 Epoch: 13 Global Step: 75960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:17,938-Speed 10657.45 samples/sec Loss 7.0905 LearningRate 0.0444 Epoch: 13 Global Step: 75970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:18,884-Speed 10831.15 samples/sec Loss 7.0613 LearningRate 0.0443 Epoch: 13 Global Step: 75980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:19,836-Speed 10769.56 samples/sec Loss 7.0335 LearningRate 0.0443 Epoch: 13 Global Step: 75990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:56:20,731-Speed 11449.57 samples/sec Loss 7.1778 LearningRate 0.0443 Epoch: 13 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:56:42,301-[lfw][76000]XNorm: 12.144690 Training: 2022-04-11 01:56:42,302-[lfw][76000]Accuracy-Flip: 0.99517+-0.00293 Training: 2022-04-11 01:56:42,302-[lfw][76000]Accuracy-Highest: 0.99583 Training: 2022-04-11 01:57:07,021-[cfp_fp][76000]XNorm: 10.142651 Training: 2022-04-11 01:57:07,023-[cfp_fp][76000]Accuracy-Flip: 0.93886+-0.01018 Training: 2022-04-11 01:57:07,024-[cfp_fp][76000]Accuracy-Highest: 0.94271 Training: 2022-04-11 01:57:28,323-[agedb_30][76000]XNorm: 11.879020 Training: 2022-04-11 01:57:28,324-[agedb_30][76000]Accuracy-Flip: 0.96050+-0.01075 Training: 2022-04-11 01:57:28,324-[agedb_30][76000]Accuracy-Highest: 0.96083 Training: 2022-04-11 01:57:29,287-Speed 149.37 samples/sec Loss 7.1810 LearningRate 0.0443 Epoch: 13 Global Step: 76010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:30,218-Speed 11013.16 samples/sec Loss 7.1945 LearningRate 0.0443 Epoch: 13 Global Step: 76020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:31,179-Speed 10673.07 samples/sec Loss 7.1841 LearningRate 0.0443 Epoch: 13 Global Step: 76030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:32,128-Speed 10805.29 samples/sec Loss 7.2177 LearningRate 0.0443 Epoch: 13 Global Step: 76040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:33,015-Speed 11558.83 samples/sec Loss 7.2094 LearningRate 0.0443 Epoch: 13 Global Step: 76050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:33,927-Speed 11229.65 samples/sec Loss 6.9924 LearningRate 0.0443 Epoch: 13 Global Step: 76060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:34,853-Speed 11064.95 samples/sec Loss 7.0895 LearningRate 0.0443 Epoch: 13 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:35,768-Speed 11206.19 samples/sec Loss 7.0479 LearningRate 0.0443 Epoch: 13 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:36,654-Speed 11565.99 samples/sec Loss 7.2665 LearningRate 0.0443 Epoch: 13 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:37,568-Speed 11215.55 samples/sec Loss 7.1829 LearningRate 0.0443 Epoch: 13 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:38,565-Speed 10278.29 samples/sec Loss 7.2923 LearningRate 0.0443 Epoch: 13 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:39,521-Speed 10725.70 samples/sec Loss 7.3598 LearningRate 0.0443 Epoch: 13 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:40,470-Speed 10795.35 samples/sec Loss 7.1927 LearningRate 0.0443 Epoch: 13 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:41,351-Speed 11632.72 samples/sec Loss 7.3260 LearningRate 0.0443 Epoch: 13 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:42,237-Speed 11574.60 samples/sec Loss 7.2079 LearningRate 0.0442 Epoch: 13 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:43,209-Speed 10541.03 samples/sec Loss 7.1745 LearningRate 0.0442 Epoch: 13 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:44,187-Speed 10476.55 samples/sec Loss 7.1764 LearningRate 0.0442 Epoch: 13 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:57:45,085-Speed 11425.85 samples/sec Loss 7.2779 LearningRate 0.0442 Epoch: 13 Global Step: 76180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:45,983-Speed 11404.61 samples/sec Loss 7.1880 LearningRate 0.0442 Epoch: 13 Global Step: 76190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:46,892-Speed 11280.91 samples/sec Loss 7.1645 LearningRate 0.0442 Epoch: 13 Global Step: 76200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:47,825-Speed 10973.84 samples/sec Loss 7.2246 LearningRate 0.0442 Epoch: 13 Global Step: 76210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:48,803-Speed 10495.47 samples/sec Loss 7.3177 LearningRate 0.0442 Epoch: 13 Global Step: 76220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:49,734-Speed 11001.50 samples/sec Loss 7.2107 LearningRate 0.0442 Epoch: 13 Global Step: 76230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:50,653-Speed 11157.44 samples/sec Loss 7.1382 LearningRate 0.0442 Epoch: 13 Global Step: 76240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:51,632-Speed 10462.17 samples/sec Loss 7.2069 LearningRate 0.0442 Epoch: 13 Global Step: 76250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:52,577-Speed 10845.56 samples/sec Loss 7.0940 LearningRate 0.0442 Epoch: 13 Global Step: 76260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:53,534-Speed 10711.80 samples/sec Loss 7.1925 LearningRate 0.0442 Epoch: 13 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:54,465-Speed 11003.17 samples/sec Loss 7.2367 LearningRate 0.0442 Epoch: 13 Global Step: 76280 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:57:55,364-Speed 11402.08 samples/sec Loss 7.3564 LearningRate 0.0442 Epoch: 13 Global Step: 76290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:56,310-Speed 10838.21 samples/sec Loss 7.2611 LearningRate 0.0442 Epoch: 13 Global Step: 76300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:57,190-Speed 11636.83 samples/sec Loss 7.2764 LearningRate 0.0442 Epoch: 13 Global Step: 76310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:58,084-Speed 11459.65 samples/sec Loss 7.2791 LearningRate 0.0441 Epoch: 13 Global Step: 76320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:59,016-Speed 11002.92 samples/sec Loss 7.2923 LearningRate 0.0441 Epoch: 13 Global Step: 76330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:57:59,938-Speed 11115.57 samples/sec Loss 7.1526 LearningRate 0.0441 Epoch: 13 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:00,884-Speed 10824.10 samples/sec Loss 7.1437 LearningRate 0.0441 Epoch: 13 Global Step: 76350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:01,802-Speed 11170.51 samples/sec Loss 7.2904 LearningRate 0.0441 Epoch: 13 Global Step: 76360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:02,731-Speed 11025.21 samples/sec Loss 7.3091 LearningRate 0.0441 Epoch: 13 Global Step: 76370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:03,707-Speed 10506.98 samples/sec Loss 7.1528 LearningRate 0.0441 Epoch: 13 Global Step: 76380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:04,637-Speed 11021.18 samples/sec Loss 7.0873 LearningRate 0.0441 Epoch: 13 Global Step: 76390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:05,556-Speed 11144.55 samples/sec Loss 7.2336 LearningRate 0.0441 Epoch: 13 Global Step: 76400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:06,489-Speed 10987.00 samples/sec Loss 7.1699 LearningRate 0.0441 Epoch: 13 Global Step: 76410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:07,401-Speed 11229.99 samples/sec Loss 7.2543 LearningRate 0.0441 Epoch: 13 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:08,326-Speed 11084.98 samples/sec Loss 7.3653 LearningRate 0.0441 Epoch: 13 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:09,241-Speed 11197.90 samples/sec Loss 7.2791 LearningRate 0.0441 Epoch: 13 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:10,126-Speed 11587.82 samples/sec Loss 7.3596 LearningRate 0.0441 Epoch: 13 Global Step: 76450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:11,019-Speed 11468.80 samples/sec Loss 7.0344 LearningRate 0.0441 Epoch: 13 Global Step: 76460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:11,985-Speed 10604.32 samples/sec Loss 7.2232 LearningRate 0.0441 Epoch: 13 Global Step: 76470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:12,960-Speed 10512.97 samples/sec Loss 7.0951 LearningRate 0.0441 Epoch: 13 Global Step: 76480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:13,878-Speed 11163.77 samples/sec Loss 7.2702 LearningRate 0.0440 Epoch: 13 Global Step: 76490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:14,856-Speed 10484.52 samples/sec Loss 7.2104 LearningRate 0.0440 Epoch: 13 Global Step: 76500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:15,760-Speed 11330.87 samples/sec Loss 7.2598 LearningRate 0.0440 Epoch: 13 Global Step: 76510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:16,871-Speed 9227.37 samples/sec Loss 7.1496 LearningRate 0.0440 Epoch: 13 Global Step: 76520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:17,804-Speed 10980.29 samples/sec Loss 7.2568 LearningRate 0.0440 Epoch: 13 Global Step: 76530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:18,739-Speed 10968.90 samples/sec Loss 7.3470 LearningRate 0.0440 Epoch: 13 Global Step: 76540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:19,697-Speed 10698.96 samples/sec Loss 7.1444 LearningRate 0.0440 Epoch: 13 Global Step: 76550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:20,615-Speed 11153.40 samples/sec Loss 7.2486 LearningRate 0.0440 Epoch: 13 Global Step: 76560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:21,552-Speed 10944.02 samples/sec Loss 7.1615 LearningRate 0.0440 Epoch: 13 Global Step: 76570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:22,436-Speed 11594.85 samples/sec Loss 7.1447 LearningRate 0.0440 Epoch: 13 Global Step: 76580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:23,428-Speed 10327.18 samples/sec Loss 7.2829 LearningRate 0.0440 Epoch: 13 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:24,329-Speed 11381.83 samples/sec Loss 7.2034 LearningRate 0.0440 Epoch: 13 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:25,250-Speed 11117.77 samples/sec Loss 7.1816 LearningRate 0.0440 Epoch: 13 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:26,191-Speed 10892.26 samples/sec Loss 7.2612 LearningRate 0.0440 Epoch: 13 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:27,080-Speed 11533.88 samples/sec Loss 7.1799 LearningRate 0.0440 Epoch: 13 Global Step: 76630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:28,034-Speed 10742.82 samples/sec Loss 7.3956 LearningRate 0.0440 Epoch: 13 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:28,920-Speed 11560.56 samples/sec Loss 7.2255 LearningRate 0.0440 Epoch: 13 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:29,800-Speed 11651.82 samples/sec Loss 7.2978 LearningRate 0.0439 Epoch: 13 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:30,725-Speed 11082.97 samples/sec Loss 7.2857 LearningRate 0.0439 Epoch: 13 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:31,670-Speed 10843.09 samples/sec Loss 7.1523 LearningRate 0.0439 Epoch: 13 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:32,577-Speed 11297.46 samples/sec Loss 7.3456 LearningRate 0.0439 Epoch: 13 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:33,456-Speed 11659.16 samples/sec Loss 7.2284 LearningRate 0.0439 Epoch: 13 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:34,345-Speed 11528.75 samples/sec Loss 7.1419 LearningRate 0.0439 Epoch: 13 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:35,243-Speed 11404.62 samples/sec Loss 7.2474 LearningRate 0.0439 Epoch: 13 Global Step: 76720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:36,218-Speed 10515.79 samples/sec Loss 7.4515 LearningRate 0.0439 Epoch: 13 Global Step: 76730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:37,133-Speed 11194.60 samples/sec Loss 7.2185 LearningRate 0.0439 Epoch: 13 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:38,101-Speed 10583.82 samples/sec Loss 7.2615 LearningRate 0.0439 Epoch: 13 Global Step: 76750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:39,035-Speed 10972.68 samples/sec Loss 7.1994 LearningRate 0.0439 Epoch: 13 Global Step: 76760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:40,003-Speed 10588.27 samples/sec Loss 7.2164 LearningRate 0.0439 Epoch: 13 Global Step: 76770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:40,962-Speed 10683.87 samples/sec Loss 7.3669 LearningRate 0.0439 Epoch: 13 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:41,901-Speed 10923.42 samples/sec Loss 7.0918 LearningRate 0.0439 Epoch: 13 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:42,791-Speed 11515.36 samples/sec Loss 7.2140 LearningRate 0.0439 Epoch: 13 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:43,711-Speed 11130.54 samples/sec Loss 7.1415 LearningRate 0.0439 Epoch: 13 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:44,656-Speed 10847.90 samples/sec Loss 7.1743 LearningRate 0.0439 Epoch: 13 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:45,530-Speed 11731.44 samples/sec Loss 7.2940 LearningRate 0.0439 Epoch: 13 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:46,434-Speed 11341.33 samples/sec Loss 7.2656 LearningRate 0.0438 Epoch: 13 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:47,403-Speed 10578.04 samples/sec Loss 7.2168 LearningRate 0.0438 Epoch: 13 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:48,306-Speed 11342.20 samples/sec Loss 7.1917 LearningRate 0.0438 Epoch: 13 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:49,273-Speed 10599.67 samples/sec Loss 7.0861 LearningRate 0.0438 Epoch: 13 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:50,168-Speed 11457.72 samples/sec Loss 7.1425 LearningRate 0.0438 Epoch: 13 Global Step: 76880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:51,095-Speed 11061.87 samples/sec Loss 7.3213 LearningRate 0.0438 Epoch: 13 Global Step: 76890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:52,031-Speed 10937.40 samples/sec Loss 7.1443 LearningRate 0.0438 Epoch: 13 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:52,980-Speed 10821.07 samples/sec Loss 7.1450 LearningRate 0.0438 Epoch: 13 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:53,865-Speed 11584.36 samples/sec Loss 7.2023 LearningRate 0.0438 Epoch: 13 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:58:54,804-Speed 10912.14 samples/sec Loss 7.2363 LearningRate 0.0438 Epoch: 13 Global Step: 76930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:55,746-Speed 10876.97 samples/sec Loss 7.2946 LearningRate 0.0438 Epoch: 13 Global Step: 76940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:56,642-Speed 11437.12 samples/sec Loss 7.1487 LearningRate 0.0438 Epoch: 13 Global Step: 76950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:57,578-Speed 10943.04 samples/sec Loss 7.2158 LearningRate 0.0438 Epoch: 13 Global Step: 76960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:58,501-Speed 11105.02 samples/sec Loss 7.1542 LearningRate 0.0438 Epoch: 13 Global Step: 76970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:58:59,392-Speed 11506.15 samples/sec Loss 7.3058 LearningRate 0.0438 Epoch: 13 Global Step: 76980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:00,318-Speed 11064.85 samples/sec Loss 7.1552 LearningRate 0.0438 Epoch: 13 Global Step: 76990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:01,257-Speed 10916.56 samples/sec Loss 7.0978 LearningRate 0.0438 Epoch: 13 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:02,226-Speed 10572.63 samples/sec Loss 7.2256 LearningRate 0.0437 Epoch: 13 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:03,137-Speed 11255.98 samples/sec Loss 7.3908 LearningRate 0.0437 Epoch: 13 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:04,093-Speed 10713.46 samples/sec Loss 7.3020 LearningRate 0.0437 Epoch: 13 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:04,971-Speed 11673.15 samples/sec Loss 7.3574 LearningRate 0.0437 Epoch: 13 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:05,900-Speed 11032.82 samples/sec Loss 7.2870 LearningRate 0.0437 Epoch: 13 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:06,810-Speed 11261.02 samples/sec Loss 7.2307 LearningRate 0.0437 Epoch: 13 Global Step: 77060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:07,723-Speed 11226.42 samples/sec Loss 7.1619 LearningRate 0.0437 Epoch: 13 Global Step: 77070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:08,636-Speed 11215.53 samples/sec Loss 7.3618 LearningRate 0.0437 Epoch: 13 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:09,534-Speed 11423.71 samples/sec Loss 7.1421 LearningRate 0.0437 Epoch: 13 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:10,441-Speed 11297.10 samples/sec Loss 7.4061 LearningRate 0.0437 Epoch: 13 Global Step: 77100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:11,386-Speed 10842.93 samples/sec Loss 7.2553 LearningRate 0.0437 Epoch: 13 Global Step: 77110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:12,367-Speed 10440.13 samples/sec Loss 7.2523 LearningRate 0.0437 Epoch: 13 Global Step: 77120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:13,303-Speed 10959.36 samples/sec Loss 7.3760 LearningRate 0.0437 Epoch: 13 Global Step: 77130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:14,250-Speed 10822.79 samples/sec Loss 7.2525 LearningRate 0.0437 Epoch: 13 Global Step: 77140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:15,159-Speed 11264.81 samples/sec Loss 7.2131 LearningRate 0.0437 Epoch: 13 Global Step: 77150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:16,103-Speed 10863.57 samples/sec Loss 7.1430 LearningRate 0.0437 Epoch: 13 Global Step: 77160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:17,055-Speed 10754.70 samples/sec Loss 7.3148 LearningRate 0.0437 Epoch: 13 Global Step: 77170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:17,977-Speed 11117.43 samples/sec Loss 7.3453 LearningRate 0.0436 Epoch: 13 Global Step: 77180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:18,892-Speed 11211.48 samples/sec Loss 7.1997 LearningRate 0.0436 Epoch: 13 Global Step: 77190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:19,795-Speed 11339.22 samples/sec Loss 7.3917 LearningRate 0.0436 Epoch: 13 Global Step: 77200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:20,740-Speed 10842.90 samples/sec Loss 7.2410 LearningRate 0.0436 Epoch: 13 Global Step: 77210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:21,654-Speed 11215.96 samples/sec Loss 7.1964 LearningRate 0.0436 Epoch: 13 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:22,570-Speed 11186.90 samples/sec Loss 7.2006 LearningRate 0.0436 Epoch: 13 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:23,473-Speed 11361.27 samples/sec Loss 7.2224 LearningRate 0.0436 Epoch: 13 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:24,425-Speed 10758.38 samples/sec Loss 7.3230 LearningRate 0.0436 Epoch: 13 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:25,457-Speed 9935.49 samples/sec Loss 7.1847 LearningRate 0.0436 Epoch: 13 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:26,371-Speed 11208.54 samples/sec Loss 7.1551 LearningRate 0.0436 Epoch: 13 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:27,242-Speed 11763.21 samples/sec Loss 7.2952 LearningRate 0.0436 Epoch: 13 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:28,194-Speed 10770.20 samples/sec Loss 7.1920 LearningRate 0.0436 Epoch: 13 Global Step: 77290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:29,148-Speed 10740.88 samples/sec Loss 7.1876 LearningRate 0.0436 Epoch: 13 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:30,087-Speed 10922.49 samples/sec Loss 7.3617 LearningRate 0.0436 Epoch: 13 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:31,039-Speed 10756.57 samples/sec Loss 7.3407 LearningRate 0.0436 Epoch: 13 Global Step: 77320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:31,978-Speed 10917.41 samples/sec Loss 7.3955 LearningRate 0.0436 Epoch: 13 Global Step: 77330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:32,892-Speed 11209.43 samples/sec Loss 7.1093 LearningRate 0.0436 Epoch: 13 Global Step: 77340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:33,777-Speed 11591.73 samples/sec Loss 7.2352 LearningRate 0.0435 Epoch: 13 Global Step: 77350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:34,675-Speed 11405.43 samples/sec Loss 7.3828 LearningRate 0.0435 Epoch: 13 Global Step: 77360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:35,561-Speed 11574.74 samples/sec Loss 7.2488 LearningRate 0.0435 Epoch: 13 Global Step: 77370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:36,544-Speed 10424.00 samples/sec Loss 7.1948 LearningRate 0.0435 Epoch: 13 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:37,486-Speed 10879.43 samples/sec Loss 7.2433 LearningRate 0.0435 Epoch: 13 Global Step: 77390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:38,440-Speed 10741.64 samples/sec Loss 7.1973 LearningRate 0.0435 Epoch: 13 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:39,342-Speed 11361.73 samples/sec Loss 7.2642 LearningRate 0.0435 Epoch: 13 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:40,283-Speed 10880.62 samples/sec Loss 7.2075 LearningRate 0.0435 Epoch: 13 Global Step: 77420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:41,250-Speed 10603.79 samples/sec Loss 7.2561 LearningRate 0.0435 Epoch: 13 Global Step: 77430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:42,190-Speed 10903.62 samples/sec Loss 7.2231 LearningRate 0.0435 Epoch: 13 Global Step: 77440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:43,095-Speed 11323.03 samples/sec Loss 7.1951 LearningRate 0.0435 Epoch: 13 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:44,027-Speed 10996.51 samples/sec Loss 7.2736 LearningRate 0.0435 Epoch: 13 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:44,931-Speed 11342.16 samples/sec Loss 7.2518 LearningRate 0.0435 Epoch: 13 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:45,832-Speed 11377.44 samples/sec Loss 7.3845 LearningRate 0.0435 Epoch: 13 Global Step: 77480 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 01:59:46,760-Speed 11033.76 samples/sec Loss 7.2769 LearningRate 0.0435 Epoch: 13 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:47,681-Speed 11131.71 samples/sec Loss 7.3448 LearningRate 0.0435 Epoch: 13 Global Step: 77500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:48,611-Speed 11019.76 samples/sec Loss 7.1451 LearningRate 0.0435 Epoch: 13 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:49,519-Speed 11292.01 samples/sec Loss 7.2184 LearningRate 0.0434 Epoch: 13 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:50,475-Speed 10714.23 samples/sec Loss 7.2575 LearningRate 0.0434 Epoch: 13 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:51,439-Speed 10635.55 samples/sec Loss 7.1446 LearningRate 0.0434 Epoch: 13 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:52,441-Speed 10226.29 samples/sec Loss 7.2152 LearningRate 0.0434 Epoch: 13 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:53,372-Speed 11006.69 samples/sec Loss 7.2528 LearningRate 0.0434 Epoch: 13 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:54,350-Speed 10481.97 samples/sec Loss 7.0897 LearningRate 0.0434 Epoch: 13 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:55,337-Speed 10380.04 samples/sec Loss 7.2286 LearningRate 0.0434 Epoch: 13 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:56,258-Speed 11127.54 samples/sec Loss 7.3258 LearningRate 0.0434 Epoch: 13 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:57,148-Speed 11515.69 samples/sec Loss 7.3884 LearningRate 0.0434 Epoch: 13 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:58,053-Speed 11327.32 samples/sec Loss 7.3850 LearningRate 0.0434 Epoch: 13 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 01:59:58,953-Speed 11382.69 samples/sec Loss 7.3253 LearningRate 0.0434 Epoch: 13 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 01:59:59,925-Speed 10537.55 samples/sec Loss 7.3441 LearningRate 0.0434 Epoch: 13 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:00,855-Speed 11035.49 samples/sec Loss 7.1580 LearningRate 0.0434 Epoch: 13 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:01,788-Speed 10985.26 samples/sec Loss 7.2688 LearningRate 0.0434 Epoch: 13 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:02,686-Speed 11413.93 samples/sec Loss 7.3917 LearningRate 0.0434 Epoch: 13 Global Step: 77660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:03,601-Speed 11194.43 samples/sec Loss 7.4173 LearningRate 0.0434 Epoch: 13 Global Step: 77670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:04,565-Speed 10626.04 samples/sec Loss 7.2732 LearningRate 0.0434 Epoch: 13 Global Step: 77680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:05,464-Speed 11413.15 samples/sec Loss 7.1977 LearningRate 0.0434 Epoch: 13 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:06,374-Speed 11257.24 samples/sec Loss 7.2073 LearningRate 0.0433 Epoch: 13 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:07,296-Speed 11109.01 samples/sec Loss 7.2668 LearningRate 0.0433 Epoch: 13 Global Step: 77710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:08,249-Speed 10752.02 samples/sec Loss 7.1958 LearningRate 0.0433 Epoch: 13 Global Step: 77720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:09,172-Speed 11104.69 samples/sec Loss 7.2344 LearningRate 0.0433 Epoch: 13 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:10,048-Speed 11707.56 samples/sec Loss 7.3764 LearningRate 0.0433 Epoch: 13 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:10,935-Speed 11546.58 samples/sec Loss 7.2949 LearningRate 0.0433 Epoch: 13 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:11,861-Speed 11075.96 samples/sec Loss 7.1611 LearningRate 0.0433 Epoch: 13 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:12,839-Speed 10475.57 samples/sec Loss 7.3263 LearningRate 0.0433 Epoch: 13 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:13,788-Speed 10794.66 samples/sec Loss 7.2077 LearningRate 0.0433 Epoch: 13 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:14,699-Speed 11256.17 samples/sec Loss 7.2091 LearningRate 0.0433 Epoch: 13 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:15,639-Speed 10901.21 samples/sec Loss 7.2231 LearningRate 0.0433 Epoch: 13 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:16,559-Speed 11133.41 samples/sec Loss 7.2169 LearningRate 0.0433 Epoch: 13 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:17,524-Speed 10626.95 samples/sec Loss 7.1717 LearningRate 0.0433 Epoch: 13 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:18,460-Speed 10954.02 samples/sec Loss 7.4432 LearningRate 0.0433 Epoch: 13 Global Step: 77830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:19,396-Speed 10945.23 samples/sec Loss 7.2871 LearningRate 0.0433 Epoch: 13 Global Step: 77840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:20,340-Speed 10847.23 samples/sec Loss 7.2615 LearningRate 0.0433 Epoch: 13 Global Step: 77850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:21,281-Speed 10899.11 samples/sec Loss 7.3127 LearningRate 0.0433 Epoch: 13 Global Step: 77860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:22,169-Speed 11543.31 samples/sec Loss 7.0781 LearningRate 0.0432 Epoch: 13 Global Step: 77870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:23,102-Speed 10976.63 samples/sec Loss 7.2767 LearningRate 0.0432 Epoch: 13 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:24,061-Speed 10687.83 samples/sec Loss 7.3107 LearningRate 0.0432 Epoch: 13 Global Step: 77890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:25,020-Speed 10685.44 samples/sec Loss 7.2443 LearningRate 0.0432 Epoch: 13 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:25,965-Speed 10852.48 samples/sec Loss 7.2324 LearningRate 0.0432 Epoch: 13 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:00:26,925-Speed 10678.52 samples/sec Loss 7.2338 LearningRate 0.0432 Epoch: 13 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:27,848-Speed 11094.79 samples/sec Loss 7.1649 LearningRate 0.0432 Epoch: 13 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:28,795-Speed 10826.90 samples/sec Loss 7.1468 LearningRate 0.0432 Epoch: 13 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:29,704-Speed 11277.30 samples/sec Loss 7.5531 LearningRate 0.0432 Epoch: 13 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:30,637-Speed 10980.15 samples/sec Loss 7.5406 LearningRate 0.0432 Epoch: 13 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:31,587-Speed 10788.71 samples/sec Loss 7.2400 LearningRate 0.0432 Epoch: 13 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:32,528-Speed 10893.60 samples/sec Loss 7.2485 LearningRate 0.0432 Epoch: 13 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:33,490-Speed 10648.52 samples/sec Loss 7.2099 LearningRate 0.0432 Epoch: 13 Global Step: 77990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:34,411-Speed 11130.56 samples/sec Loss 7.2960 LearningRate 0.0432 Epoch: 13 Global Step: 78000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:00:56,296-[lfw][78000]XNorm: 12.115212 Training: 2022-04-11 02:00:56,296-[lfw][78000]Accuracy-Flip: 0.99583+-0.00281 Training: 2022-04-11 02:00:56,297-[lfw][78000]Accuracy-Highest: 0.99583 Training: 2022-04-11 02:01:21,057-[cfp_fp][78000]XNorm: 10.100344 Training: 2022-04-11 02:01:21,058-[cfp_fp][78000]Accuracy-Flip: 0.93957+-0.01533 Training: 2022-04-11 02:01:21,058-[cfp_fp][78000]Accuracy-Highest: 0.94271 Training: 2022-04-11 02:01:42,413-[agedb_30][78000]XNorm: 11.955809 Training: 2022-04-11 02:01:42,413-[agedb_30][78000]Accuracy-Flip: 0.95650+-0.01026 Training: 2022-04-11 02:01:42,413-[agedb_30][78000]Accuracy-Highest: 0.96083 Training: 2022-04-11 02:01:43,333-Speed 148.58 samples/sec Loss 7.3320 LearningRate 0.0432 Epoch: 13 Global Step: 78010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:44,258-Speed 11077.29 samples/sec Loss 7.1474 LearningRate 0.0432 Epoch: 13 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:45,162-Speed 11326.35 samples/sec Loss 7.2925 LearningRate 0.0432 Epoch: 13 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:46,078-Speed 11197.26 samples/sec Loss 7.1407 LearningRate 0.0431 Epoch: 13 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:46,965-Speed 11548.09 samples/sec Loss 7.3465 LearningRate 0.0431 Epoch: 13 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:47,895-Speed 11017.70 samples/sec Loss 7.3412 LearningRate 0.0431 Epoch: 13 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:48,825-Speed 11024.05 samples/sec Loss 7.2007 LearningRate 0.0431 Epoch: 13 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:49,738-Speed 11224.59 samples/sec Loss 7.2728 LearningRate 0.0431 Epoch: 13 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:50,661-Speed 11100.85 samples/sec Loss 7.3645 LearningRate 0.0431 Epoch: 13 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:51,596-Speed 10954.44 samples/sec Loss 7.1289 LearningRate 0.0431 Epoch: 13 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:52,520-Speed 11089.12 samples/sec Loss 7.0930 LearningRate 0.0431 Epoch: 13 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:53,458-Speed 10932.64 samples/sec Loss 7.2053 LearningRate 0.0431 Epoch: 13 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:54,370-Speed 11234.81 samples/sec Loss 7.2543 LearningRate 0.0431 Epoch: 13 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:55,305-Speed 10956.59 samples/sec Loss 7.2477 LearningRate 0.0431 Epoch: 13 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:01:56,218-Speed 11226.20 samples/sec Loss 7.1968 LearningRate 0.0431 Epoch: 13 Global Step: 78150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:57,138-Speed 11136.31 samples/sec Loss 7.1356 LearningRate 0.0431 Epoch: 13 Global Step: 78160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:58,069-Speed 11011.67 samples/sec Loss 7.1976 LearningRate 0.0431 Epoch: 13 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:58,973-Speed 11347.46 samples/sec Loss 7.3346 LearningRate 0.0431 Epoch: 13 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:01:59,850-Speed 11687.10 samples/sec Loss 7.2452 LearningRate 0.0431 Epoch: 13 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:00,761-Speed 11241.67 samples/sec Loss 7.1796 LearningRate 0.0431 Epoch: 13 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:01,704-Speed 10874.42 samples/sec Loss 7.2130 LearningRate 0.0431 Epoch: 13 Global Step: 78210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:02,644-Speed 10896.80 samples/sec Loss 7.2592 LearningRate 0.0430 Epoch: 13 Global Step: 78220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:03,534-Speed 11522.21 samples/sec Loss 7.0513 LearningRate 0.0430 Epoch: 13 Global Step: 78230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:04,445-Speed 11243.79 samples/sec Loss 7.1139 LearningRate 0.0430 Epoch: 13 Global Step: 78240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:05,366-Speed 11127.46 samples/sec Loss 7.2571 LearningRate 0.0430 Epoch: 13 Global Step: 78250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:06,294-Speed 11042.71 samples/sec Loss 7.2296 LearningRate 0.0430 Epoch: 13 Global Step: 78260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:07,233-Speed 10918.51 samples/sec Loss 7.1723 LearningRate 0.0430 Epoch: 13 Global Step: 78270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:08,177-Speed 10857.13 samples/sec Loss 7.1798 LearningRate 0.0430 Epoch: 13 Global Step: 78280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:09,155-Speed 10476.16 samples/sec Loss 7.1910 LearningRate 0.0430 Epoch: 13 Global Step: 78290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:10,066-Speed 11246.36 samples/sec Loss 7.1091 LearningRate 0.0430 Epoch: 13 Global Step: 78300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:10,982-Speed 11195.77 samples/sec Loss 7.2671 LearningRate 0.0430 Epoch: 13 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:11,903-Speed 11122.73 samples/sec Loss 7.2686 LearningRate 0.0430 Epoch: 13 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:12,946-Speed 9826.79 samples/sec Loss 7.2702 LearningRate 0.0430 Epoch: 13 Global Step: 78330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:13,951-Speed 10202.08 samples/sec Loss 7.1642 LearningRate 0.0430 Epoch: 13 Global Step: 78340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:14,864-Speed 11224.74 samples/sec Loss 7.2145 LearningRate 0.0430 Epoch: 13 Global Step: 78350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:15,792-Speed 11047.39 samples/sec Loss 7.3067 LearningRate 0.0430 Epoch: 13 Global Step: 78360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:16,757-Speed 10622.08 samples/sec Loss 7.3325 LearningRate 0.0430 Epoch: 13 Global Step: 78370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:17,746-Speed 10350.60 samples/sec Loss 7.3192 LearningRate 0.0430 Epoch: 13 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:18,756-Speed 10148.56 samples/sec Loss 7.1954 LearningRate 0.0429 Epoch: 13 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:19,713-Speed 10714.79 samples/sec Loss 7.2937 LearningRate 0.0429 Epoch: 13 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:20,640-Speed 11061.99 samples/sec Loss 7.2372 LearningRate 0.0429 Epoch: 13 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:21,599-Speed 10686.49 samples/sec Loss 7.2093 LearningRate 0.0429 Epoch: 13 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:22,545-Speed 10834.79 samples/sec Loss 7.1130 LearningRate 0.0429 Epoch: 13 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:23,482-Speed 10930.90 samples/sec Loss 7.3126 LearningRate 0.0429 Epoch: 13 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:24,374-Speed 11498.31 samples/sec Loss 7.2060 LearningRate 0.0429 Epoch: 13 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:25,257-Speed 11600.71 samples/sec Loss 7.2788 LearningRate 0.0429 Epoch: 13 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:26,194-Speed 10932.13 samples/sec Loss 7.3655 LearningRate 0.0429 Epoch: 13 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:27,154-Speed 10673.87 samples/sec Loss 7.3027 LearningRate 0.0429 Epoch: 13 Global Step: 78480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:28,107-Speed 10752.14 samples/sec Loss 7.2693 LearningRate 0.0429 Epoch: 13 Global Step: 78490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:29,046-Speed 10921.08 samples/sec Loss 7.0610 LearningRate 0.0429 Epoch: 13 Global Step: 78500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:30,020-Speed 10515.51 samples/sec Loss 7.2037 LearningRate 0.0429 Epoch: 13 Global Step: 78510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:30,927-Speed 11301.71 samples/sec Loss 7.2989 LearningRate 0.0429 Epoch: 13 Global Step: 78520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:31,875-Speed 10812.88 samples/sec Loss 7.2316 LearningRate 0.0429 Epoch: 13 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:32,784-Speed 11267.72 samples/sec Loss 7.2859 LearningRate 0.0429 Epoch: 13 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:33,724-Speed 10903.99 samples/sec Loss 7.1354 LearningRate 0.0429 Epoch: 13 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:34,686-Speed 10653.45 samples/sec Loss 7.3404 LearningRate 0.0428 Epoch: 13 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:35,595-Speed 11276.06 samples/sec Loss 7.0636 LearningRate 0.0428 Epoch: 13 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:36,624-Speed 9954.55 samples/sec Loss 7.2021 LearningRate 0.0428 Epoch: 13 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:37,572-Speed 10813.45 samples/sec Loss 7.1765 LearningRate 0.0428 Epoch: 13 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:38,533-Speed 10662.09 samples/sec Loss 7.1932 LearningRate 0.0428 Epoch: 13 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:39,426-Speed 11482.44 samples/sec Loss 7.2382 LearningRate 0.0428 Epoch: 13 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:40,435-Speed 10159.01 samples/sec Loss 7.2792 LearningRate 0.0428 Epoch: 13 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:02:41,403-Speed 10588.77 samples/sec Loss 7.1575 LearningRate 0.0428 Epoch: 13 Global Step: 78630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:42,391-Speed 10374.91 samples/sec Loss 7.3680 LearningRate 0.0428 Epoch: 13 Global Step: 78640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:43,350-Speed 10677.68 samples/sec Loss 7.2350 LearningRate 0.0428 Epoch: 13 Global Step: 78650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:44,305-Speed 10730.65 samples/sec Loss 7.2381 LearningRate 0.0428 Epoch: 13 Global Step: 78660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:45,224-Speed 11155.27 samples/sec Loss 7.4377 LearningRate 0.0428 Epoch: 13 Global Step: 78670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:46,126-Speed 11361.08 samples/sec Loss 7.1613 LearningRate 0.0428 Epoch: 13 Global Step: 78680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:47,037-Speed 11254.91 samples/sec Loss 7.2424 LearningRate 0.0428 Epoch: 13 Global Step: 78690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:47,954-Speed 11169.38 samples/sec Loss 7.1830 LearningRate 0.0428 Epoch: 13 Global Step: 78700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:48,928-Speed 10522.71 samples/sec Loss 7.1098 LearningRate 0.0428 Epoch: 13 Global Step: 78710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:49,861-Speed 10980.49 samples/sec Loss 7.1867 LearningRate 0.0428 Epoch: 13 Global Step: 78720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:50,755-Speed 11470.86 samples/sec Loss 7.2874 LearningRate 0.0428 Epoch: 13 Global Step: 78730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:51,673-Speed 11167.84 samples/sec Loss 7.3054 LearningRate 0.0427 Epoch: 13 Global Step: 78740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:52,601-Speed 11038.92 samples/sec Loss 7.1290 LearningRate 0.0427 Epoch: 13 Global Step: 78750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:53,532-Speed 11010.54 samples/sec Loss 7.2788 LearningRate 0.0427 Epoch: 13 Global Step: 78760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:54,493-Speed 10652.26 samples/sec Loss 7.0804 LearningRate 0.0427 Epoch: 13 Global Step: 78770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:55,384-Speed 11509.84 samples/sec Loss 7.3190 LearningRate 0.0427 Epoch: 13 Global Step: 78780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:56,326-Speed 10873.06 samples/sec Loss 7.2114 LearningRate 0.0427 Epoch: 13 Global Step: 78790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:57,250-Speed 11101.29 samples/sec Loss 7.3988 LearningRate 0.0427 Epoch: 13 Global Step: 78800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:58,195-Speed 10834.66 samples/sec Loss 7.1155 LearningRate 0.0427 Epoch: 13 Global Step: 78810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:02:59,155-Speed 10676.98 samples/sec Loss 7.2250 LearningRate 0.0427 Epoch: 13 Global Step: 78820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:00,127-Speed 10544.27 samples/sec Loss 7.1588 LearningRate 0.0427 Epoch: 13 Global Step: 78830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:01,035-Speed 11288.33 samples/sec Loss 7.1904 LearningRate 0.0427 Epoch: 13 Global Step: 78840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:01,952-Speed 11178.21 samples/sec Loss 7.2102 LearningRate 0.0427 Epoch: 13 Global Step: 78850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:02,925-Speed 10538.12 samples/sec Loss 7.1445 LearningRate 0.0427 Epoch: 13 Global Step: 78860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:03,842-Speed 11181.02 samples/sec Loss 7.1211 LearningRate 0.0427 Epoch: 13 Global Step: 78870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:04,754-Speed 11230.36 samples/sec Loss 7.3107 LearningRate 0.0427 Epoch: 13 Global Step: 78880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:05,767-Speed 10115.32 samples/sec Loss 7.2886 LearningRate 0.0427 Epoch: 13 Global Step: 78890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:06,695-Speed 11051.20 samples/sec Loss 7.3035 LearningRate 0.0427 Epoch: 13 Global Step: 78900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:07,611-Speed 11184.24 samples/sec Loss 7.2209 LearningRate 0.0426 Epoch: 13 Global Step: 78910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:08,537-Speed 11065.13 samples/sec Loss 7.1058 LearningRate 0.0426 Epoch: 13 Global Step: 78920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:09,489-Speed 10762.16 samples/sec Loss 7.1651 LearningRate 0.0426 Epoch: 13 Global Step: 78930 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:03:10,415-Speed 11069.68 samples/sec Loss 7.1895 LearningRate 0.0426 Epoch: 13 Global Step: 78940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:11,368-Speed 10754.53 samples/sec Loss 7.0891 LearningRate 0.0426 Epoch: 13 Global Step: 78950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:12,335-Speed 10597.12 samples/sec Loss 7.1977 LearningRate 0.0426 Epoch: 13 Global Step: 78960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:13,276-Speed 10889.18 samples/sec Loss 7.2426 LearningRate 0.0426 Epoch: 13 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:14,245-Speed 10572.63 samples/sec Loss 7.2265 LearningRate 0.0426 Epoch: 13 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:15,143-Speed 11418.25 samples/sec Loss 7.2283 LearningRate 0.0426 Epoch: 13 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:16,070-Speed 11045.17 samples/sec Loss 7.1239 LearningRate 0.0426 Epoch: 13 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:17,001-Speed 11016.47 samples/sec Loss 7.2790 LearningRate 0.0426 Epoch: 13 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:17,968-Speed 10595.55 samples/sec Loss 7.3332 LearningRate 0.0426 Epoch: 13 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:18,901-Speed 10984.46 samples/sec Loss 7.0851 LearningRate 0.0426 Epoch: 13 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:19,847-Speed 10830.72 samples/sec Loss 7.1688 LearningRate 0.0426 Epoch: 13 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:20,782-Speed 10960.52 samples/sec Loss 7.3223 LearningRate 0.0426 Epoch: 13 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:21,682-Speed 11388.79 samples/sec Loss 7.1293 LearningRate 0.0426 Epoch: 13 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:22,590-Speed 11282.46 samples/sec Loss 7.1915 LearningRate 0.0426 Epoch: 13 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:23,521-Speed 11013.09 samples/sec Loss 7.2028 LearningRate 0.0426 Epoch: 13 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:24,428-Speed 11296.98 samples/sec Loss 7.2947 LearningRate 0.0425 Epoch: 13 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:25,369-Speed 10894.07 samples/sec Loss 7.2519 LearningRate 0.0425 Epoch: 13 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:26,287-Speed 11166.97 samples/sec Loss 7.3568 LearningRate 0.0425 Epoch: 13 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:27,184-Speed 11417.04 samples/sec Loss 7.2043 LearningRate 0.0425 Epoch: 13 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:28,110-Speed 11073.14 samples/sec Loss 7.2490 LearningRate 0.0425 Epoch: 13 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:29,084-Speed 10521.89 samples/sec Loss 7.0783 LearningRate 0.0425 Epoch: 13 Global Step: 79140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:30,010-Speed 11071.62 samples/sec Loss 7.2366 LearningRate 0.0425 Epoch: 13 Global Step: 79150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:30,933-Speed 11105.46 samples/sec Loss 7.3802 LearningRate 0.0425 Epoch: 13 Global Step: 79160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:31,880-Speed 10818.21 samples/sec Loss 7.0886 LearningRate 0.0425 Epoch: 13 Global Step: 79170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:32,781-Speed 11371.44 samples/sec Loss 7.3231 LearningRate 0.0425 Epoch: 13 Global Step: 79180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:33,663-Speed 11623.60 samples/sec Loss 7.3221 LearningRate 0.0425 Epoch: 13 Global Step: 79190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:34,567-Speed 11340.07 samples/sec Loss 7.1127 LearningRate 0.0425 Epoch: 13 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:35,510-Speed 10858.61 samples/sec Loss 7.1956 LearningRate 0.0425 Epoch: 13 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:36,468-Speed 10704.07 samples/sec Loss 7.0862 LearningRate 0.0425 Epoch: 13 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:37,400-Speed 10999.55 samples/sec Loss 7.2119 LearningRate 0.0425 Epoch: 13 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:38,337-Speed 10939.04 samples/sec Loss 7.2321 LearningRate 0.0425 Epoch: 13 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:39,263-Speed 11064.30 samples/sec Loss 7.2438 LearningRate 0.0425 Epoch: 13 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:40,207-Speed 10853.33 samples/sec Loss 7.2118 LearningRate 0.0424 Epoch: 13 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:41,095-Speed 11545.78 samples/sec Loss 7.2509 LearningRate 0.0424 Epoch: 13 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:41,992-Speed 11418.43 samples/sec Loss 7.3993 LearningRate 0.0424 Epoch: 13 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:42,928-Speed 10945.96 samples/sec Loss 7.2501 LearningRate 0.0424 Epoch: 13 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:43,814-Speed 11563.20 samples/sec Loss 7.2161 LearningRate 0.0424 Epoch: 13 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:44,752-Speed 10931.76 samples/sec Loss 7.2010 LearningRate 0.0424 Epoch: 13 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:45,730-Speed 10476.28 samples/sec Loss 7.0755 LearningRate 0.0424 Epoch: 13 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:46,662-Speed 11002.89 samples/sec Loss 7.2294 LearningRate 0.0424 Epoch: 13 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:47,540-Speed 11666.87 samples/sec Loss 7.2299 LearningRate 0.0424 Epoch: 13 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:48,518-Speed 10484.73 samples/sec Loss 7.1915 LearningRate 0.0424 Epoch: 13 Global Step: 79350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:49,467-Speed 10795.11 samples/sec Loss 7.2487 LearningRate 0.0424 Epoch: 13 Global Step: 79360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:50,420-Speed 10750.95 samples/sec Loss 7.1424 LearningRate 0.0424 Epoch: 13 Global Step: 79370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:51,380-Speed 10678.22 samples/sec Loss 7.1458 LearningRate 0.0424 Epoch: 13 Global Step: 79380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:52,338-Speed 10696.11 samples/sec Loss 7.2184 LearningRate 0.0424 Epoch: 13 Global Step: 79390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:53,262-Speed 11094.37 samples/sec Loss 7.3441 LearningRate 0.0424 Epoch: 13 Global Step: 79400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:03:54,153-Speed 11507.85 samples/sec Loss 7.1355 LearningRate 0.0424 Epoch: 13 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:55,082-Speed 11024.61 samples/sec Loss 7.2634 LearningRate 0.0424 Epoch: 13 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:56,002-Speed 11142.41 samples/sec Loss 7.3338 LearningRate 0.0423 Epoch: 13 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:56,920-Speed 11158.30 samples/sec Loss 7.2140 LearningRate 0.0423 Epoch: 13 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:57,810-Speed 11521.17 samples/sec Loss 7.1462 LearningRate 0.0423 Epoch: 13 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:58,758-Speed 10809.29 samples/sec Loss 7.1187 LearningRate 0.0423 Epoch: 13 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:03:59,673-Speed 11205.29 samples/sec Loss 7.2817 LearningRate 0.0423 Epoch: 13 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:00,628-Speed 10729.44 samples/sec Loss 7.0917 LearningRate 0.0423 Epoch: 13 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:01,576-Speed 10805.60 samples/sec Loss 7.1315 LearningRate 0.0423 Epoch: 13 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:02,601-Speed 10000.89 samples/sec Loss 7.1550 LearningRate 0.0423 Epoch: 13 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:03,532-Speed 11013.08 samples/sec Loss 7.3256 LearningRate 0.0423 Epoch: 13 Global Step: 79510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:04,469-Speed 10940.67 samples/sec Loss 7.2853 LearningRate 0.0423 Epoch: 13 Global Step: 79520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:05,417-Speed 10810.93 samples/sec Loss 7.1640 LearningRate 0.0423 Epoch: 13 Global Step: 79530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:06,371-Speed 10738.56 samples/sec Loss 7.3174 LearningRate 0.0423 Epoch: 13 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:07,330-Speed 10709.70 samples/sec Loss 7.0187 LearningRate 0.0423 Epoch: 13 Global Step: 79550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:08,237-Speed 11297.10 samples/sec Loss 7.1295 LearningRate 0.0423 Epoch: 13 Global Step: 79560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:09,188-Speed 10780.22 samples/sec Loss 7.0783 LearningRate 0.0423 Epoch: 13 Global Step: 79570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:10,109-Speed 11119.20 samples/sec Loss 7.2316 LearningRate 0.0423 Epoch: 13 Global Step: 79580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:11,050-Speed 10897.61 samples/sec Loss 7.2941 LearningRate 0.0423 Epoch: 13 Global Step: 79590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:12,013-Speed 10637.99 samples/sec Loss 7.1990 LearningRate 0.0423 Epoch: 13 Global Step: 79600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:23,793-Speed 869.34 samples/sec Loss 6.7865 LearningRate 0.0422 Epoch: 14 Global Step: 79610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:24,745-Speed 10772.21 samples/sec Loss 6.5997 LearningRate 0.0422 Epoch: 14 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:25,676-Speed 11018.08 samples/sec Loss 6.6161 LearningRate 0.0422 Epoch: 14 Global Step: 79630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:26,826-Speed 8910.39 samples/sec Loss 6.4736 LearningRate 0.0422 Epoch: 14 Global Step: 79640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:27,774-Speed 10812.02 samples/sec Loss 6.4400 LearningRate 0.0422 Epoch: 14 Global Step: 79650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:28,934-Speed 8830.61 samples/sec Loss 6.4650 LearningRate 0.0422 Epoch: 14 Global Step: 79660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:29,928-Speed 10308.92 samples/sec Loss 6.5167 LearningRate 0.0422 Epoch: 14 Global Step: 79670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:30,913-Speed 10403.36 samples/sec Loss 6.4805 LearningRate 0.0422 Epoch: 14 Global Step: 79680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:31,811-Speed 11413.65 samples/sec Loss 6.5071 LearningRate 0.0422 Epoch: 14 Global Step: 79690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:32,728-Speed 11169.89 samples/sec Loss 6.5379 LearningRate 0.0422 Epoch: 14 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:33,664-Speed 10957.80 samples/sec Loss 6.4855 LearningRate 0.0422 Epoch: 14 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:34,578-Speed 11201.34 samples/sec Loss 6.6000 LearningRate 0.0422 Epoch: 14 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:35,478-Speed 11396.97 samples/sec Loss 6.4834 LearningRate 0.0422 Epoch: 14 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:36,372-Speed 11456.72 samples/sec Loss 6.4288 LearningRate 0.0422 Epoch: 14 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:37,300-Speed 11046.33 samples/sec Loss 6.7210 LearningRate 0.0422 Epoch: 14 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:38,220-Speed 11134.81 samples/sec Loss 6.4100 LearningRate 0.0422 Epoch: 14 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:39,138-Speed 11165.69 samples/sec Loss 6.5522 LearningRate 0.0422 Epoch: 14 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:40,076-Speed 10916.67 samples/sec Loss 6.6009 LearningRate 0.0421 Epoch: 14 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:41,048-Speed 10545.79 samples/sec Loss 6.6820 LearningRate 0.0421 Epoch: 14 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:04:41,958-Speed 11271.64 samples/sec Loss 6.6694 LearningRate 0.0421 Epoch: 14 Global Step: 79800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:42,923-Speed 10618.42 samples/sec Loss 6.7330 LearningRate 0.0421 Epoch: 14 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:43,854-Speed 11006.88 samples/sec Loss 6.6577 LearningRate 0.0421 Epoch: 14 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:44,803-Speed 10809.80 samples/sec Loss 6.5634 LearningRate 0.0421 Epoch: 14 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:45,721-Speed 11164.67 samples/sec Loss 6.6759 LearningRate 0.0421 Epoch: 14 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:46,651-Speed 11012.53 samples/sec Loss 6.6764 LearningRate 0.0421 Epoch: 14 Global Step: 79850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:47,566-Speed 11204.46 samples/sec Loss 6.5199 LearningRate 0.0421 Epoch: 14 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:48,523-Speed 10711.80 samples/sec Loss 6.6469 LearningRate 0.0421 Epoch: 14 Global Step: 79870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:49,455-Speed 10989.03 samples/sec Loss 6.6837 LearningRate 0.0421 Epoch: 14 Global Step: 79880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:50,375-Speed 11134.40 samples/sec Loss 6.6778 LearningRate 0.0421 Epoch: 14 Global Step: 79890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:51,307-Speed 10999.13 samples/sec Loss 6.8010 LearningRate 0.0421 Epoch: 14 Global Step: 79900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:52,267-Speed 10677.67 samples/sec Loss 6.6548 LearningRate 0.0421 Epoch: 14 Global Step: 79910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:53,215-Speed 10807.43 samples/sec Loss 6.7082 LearningRate 0.0421 Epoch: 14 Global Step: 79920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:54,188-Speed 10526.60 samples/sec Loss 6.7323 LearningRate 0.0421 Epoch: 14 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:55,128-Speed 10907.11 samples/sec Loss 6.4621 LearningRate 0.0421 Epoch: 14 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:56,053-Speed 11068.29 samples/sec Loss 6.7896 LearningRate 0.0421 Epoch: 14 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:56,963-Speed 11265.96 samples/sec Loss 6.8098 LearningRate 0.0420 Epoch: 14 Global Step: 79960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:57,861-Speed 11420.31 samples/sec Loss 6.7901 LearningRate 0.0420 Epoch: 14 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:58,797-Speed 10940.30 samples/sec Loss 6.8261 LearningRate 0.0420 Epoch: 14 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:04:59,723-Speed 11073.99 samples/sec Loss 6.7411 LearningRate 0.0420 Epoch: 14 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:05:00,625-Speed 11359.74 samples/sec Loss 6.5702 LearningRate 0.0420 Epoch: 14 Global Step: 80000 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:05:21,908-[lfw][80000]XNorm: 11.945239 Training: 2022-04-11 02:05:21,909-[lfw][80000]Accuracy-Flip: 0.99500+-0.00365 Training: 2022-04-11 02:05:21,909-[lfw][80000]Accuracy-Highest: 0.99583 Training: 2022-04-11 02:05:46,620-[cfp_fp][80000]XNorm: 10.020629 Training: 2022-04-11 02:05:46,621-[cfp_fp][80000]Accuracy-Flip: 0.93786+-0.01007 Training: 2022-04-11 02:05:46,621-[cfp_fp][80000]Accuracy-Highest: 0.94271 Training: 2022-04-11 02:06:07,857-[agedb_30][80000]XNorm: 11.746601 Training: 2022-04-11 02:06:07,858-[agedb_30][80000]Accuracy-Flip: 0.95900+-0.00989 Training: 2022-04-11 02:06:07,858-[agedb_30][80000]Accuracy-Highest: 0.96083 Training: 2022-04-11 02:06:08,765-Speed 150.28 samples/sec Loss 6.8157 LearningRate 0.0420 Epoch: 14 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:09,689-Speed 11087.23 samples/sec Loss 6.7995 LearningRate 0.0420 Epoch: 14 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:10,619-Speed 11016.13 samples/sec Loss 6.8060 LearningRate 0.0420 Epoch: 14 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:11,529-Speed 11277.18 samples/sec Loss 6.7378 LearningRate 0.0420 Epoch: 14 Global Step: 80040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:12,487-Speed 10700.69 samples/sec Loss 6.8740 LearningRate 0.0420 Epoch: 14 Global Step: 80050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:13,434-Speed 10828.15 samples/sec Loss 6.8784 LearningRate 0.0420 Epoch: 14 Global Step: 80060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:14,368-Speed 10962.52 samples/sec Loss 6.7555 LearningRate 0.0420 Epoch: 14 Global Step: 80070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:15,327-Speed 10690.65 samples/sec Loss 6.7877 LearningRate 0.0420 Epoch: 14 Global Step: 80080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:16,291-Speed 10634.10 samples/sec Loss 6.6574 LearningRate 0.0420 Epoch: 14 Global Step: 80090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:17,255-Speed 10628.68 samples/sec Loss 6.7820 LearningRate 0.0420 Epoch: 14 Global Step: 80100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:18,195-Speed 10894.14 samples/sec Loss 6.7047 LearningRate 0.0420 Epoch: 14 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:19,159-Speed 10631.50 samples/sec Loss 6.7086 LearningRate 0.0420 Epoch: 14 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:20,117-Speed 10695.90 samples/sec Loss 6.7609 LearningRate 0.0419 Epoch: 14 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:21,029-Speed 11246.05 samples/sec Loss 6.8280 LearningRate 0.0419 Epoch: 14 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:21,911-Speed 11608.48 samples/sec Loss 6.7994 LearningRate 0.0419 Epoch: 14 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:22,841-Speed 11018.99 samples/sec Loss 6.6791 LearningRate 0.0419 Epoch: 14 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:23,784-Speed 10865.98 samples/sec Loss 6.8487 LearningRate 0.0419 Epoch: 14 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:24,700-Speed 11194.41 samples/sec Loss 6.7387 LearningRate 0.0419 Epoch: 14 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:25,618-Speed 11152.19 samples/sec Loss 7.0003 LearningRate 0.0419 Epoch: 14 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:26,584-Speed 10606.27 samples/sec Loss 6.8741 LearningRate 0.0419 Epoch: 14 Global Step: 80200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:27,500-Speed 11192.84 samples/sec Loss 6.8409 LearningRate 0.0419 Epoch: 14 Global Step: 80210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:28,438-Speed 10928.01 samples/sec Loss 6.7133 LearningRate 0.0419 Epoch: 14 Global Step: 80220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:29,330-Speed 11485.03 samples/sec Loss 6.7100 LearningRate 0.0419 Epoch: 14 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:30,258-Speed 11046.32 samples/sec Loss 6.8298 LearningRate 0.0419 Epoch: 14 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:31,147-Speed 11522.87 samples/sec Loss 6.9452 LearningRate 0.0419 Epoch: 14 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:32,054-Speed 11296.67 samples/sec Loss 6.7964 LearningRate 0.0419 Epoch: 14 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:32,989-Speed 10958.30 samples/sec Loss 6.7751 LearningRate 0.0419 Epoch: 14 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:33,928-Speed 10919.78 samples/sec Loss 6.8690 LearningRate 0.0419 Epoch: 14 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:34,860-Speed 10988.17 samples/sec Loss 6.7746 LearningRate 0.0419 Epoch: 14 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:35,858-Speed 10264.86 samples/sec Loss 6.6770 LearningRate 0.0419 Epoch: 14 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:36,779-Speed 11134.88 samples/sec Loss 6.7503 LearningRate 0.0418 Epoch: 14 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:37,757-Speed 10485.60 samples/sec Loss 6.8405 LearningRate 0.0418 Epoch: 14 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:38,630-Speed 11739.00 samples/sec Loss 6.8441 LearningRate 0.0418 Epoch: 14 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:39,711-Speed 9485.44 samples/sec Loss 6.8517 LearningRate 0.0418 Epoch: 14 Global Step: 80340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:40,653-Speed 10876.84 samples/sec Loss 6.8000 LearningRate 0.0418 Epoch: 14 Global Step: 80350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:41,712-Speed 9673.06 samples/sec Loss 7.0070 LearningRate 0.0418 Epoch: 14 Global Step: 80360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:42,663-Speed 10768.73 samples/sec Loss 6.8617 LearningRate 0.0418 Epoch: 14 Global Step: 80370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:43,579-Speed 11188.27 samples/sec Loss 6.7459 LearningRate 0.0418 Epoch: 14 Global Step: 80380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:44,473-Speed 11463.53 samples/sec Loss 6.8137 LearningRate 0.0418 Epoch: 14 Global Step: 80390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:45,355-Speed 11616.91 samples/sec Loss 6.9066 LearningRate 0.0418 Epoch: 14 Global Step: 80400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:46,280-Speed 11076.43 samples/sec Loss 7.0883 LearningRate 0.0418 Epoch: 14 Global Step: 80410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:47,196-Speed 11185.03 samples/sec Loss 6.8068 LearningRate 0.0418 Epoch: 14 Global Step: 80420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:48,118-Speed 11124.30 samples/sec Loss 6.6239 LearningRate 0.0418 Epoch: 14 Global Step: 80430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:49,017-Speed 11394.76 samples/sec Loss 6.9087 LearningRate 0.0418 Epoch: 14 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:49,947-Speed 11014.05 samples/sec Loss 6.9211 LearningRate 0.0418 Epoch: 14 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:50,867-Speed 11139.92 samples/sec Loss 6.8360 LearningRate 0.0418 Epoch: 14 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:51,776-Speed 11267.65 samples/sec Loss 6.7853 LearningRate 0.0418 Epoch: 14 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:52,668-Speed 11490.91 samples/sec Loss 6.8469 LearningRate 0.0418 Epoch: 14 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:53,563-Speed 11448.58 samples/sec Loss 6.9170 LearningRate 0.0417 Epoch: 14 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:54,477-Speed 11208.78 samples/sec Loss 6.8679 LearningRate 0.0417 Epoch: 14 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:55,430-Speed 10761.51 samples/sec Loss 6.8791 LearningRate 0.0417 Epoch: 14 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:56,364-Speed 10969.68 samples/sec Loss 6.9460 LearningRate 0.0417 Epoch: 14 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:57,277-Speed 11218.01 samples/sec Loss 6.8626 LearningRate 0.0417 Epoch: 14 Global Step: 80530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:06:58,210-Speed 10986.26 samples/sec Loss 6.8233 LearningRate 0.0417 Epoch: 14 Global Step: 80540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:06:59,115-Speed 11319.72 samples/sec Loss 6.8936 LearningRate 0.0417 Epoch: 14 Global Step: 80550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:00,048-Speed 10986.78 samples/sec Loss 6.9829 LearningRate 0.0417 Epoch: 14 Global Step: 80560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:00,966-Speed 11160.54 samples/sec Loss 6.9965 LearningRate 0.0417 Epoch: 14 Global Step: 80570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:01,850-Speed 11597.01 samples/sec Loss 6.8778 LearningRate 0.0417 Epoch: 14 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:02,777-Speed 11051.60 samples/sec Loss 6.7581 LearningRate 0.0417 Epoch: 14 Global Step: 80590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:03,709-Speed 10989.68 samples/sec Loss 6.7331 LearningRate 0.0417 Epoch: 14 Global Step: 80600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:04,645-Speed 10955.40 samples/sec Loss 7.0537 LearningRate 0.0417 Epoch: 14 Global Step: 80610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:05,573-Speed 11039.18 samples/sec Loss 6.9344 LearningRate 0.0417 Epoch: 14 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:06,522-Speed 10800.44 samples/sec Loss 6.9924 LearningRate 0.0417 Epoch: 14 Global Step: 80630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:07,444-Speed 11118.50 samples/sec Loss 6.8726 LearningRate 0.0417 Epoch: 14 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:08,363-Speed 11146.83 samples/sec Loss 6.9616 LearningRate 0.0417 Epoch: 14 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:09,291-Speed 11041.56 samples/sec Loss 6.8355 LearningRate 0.0416 Epoch: 14 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:10,192-Speed 11373.84 samples/sec Loss 6.8858 LearningRate 0.0416 Epoch: 14 Global Step: 80670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:11,098-Speed 11314.26 samples/sec Loss 6.9767 LearningRate 0.0416 Epoch: 14 Global Step: 80680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:12,039-Speed 10883.70 samples/sec Loss 6.9703 LearningRate 0.0416 Epoch: 14 Global Step: 80690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:12,931-Speed 11486.30 samples/sec Loss 6.9934 LearningRate 0.0416 Epoch: 14 Global Step: 80700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:13,844-Speed 11225.31 samples/sec Loss 6.9904 LearningRate 0.0416 Epoch: 14 Global Step: 80710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:14,804-Speed 10670.33 samples/sec Loss 6.8887 LearningRate 0.0416 Epoch: 14 Global Step: 80720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:15,755-Speed 10776.73 samples/sec Loss 6.9166 LearningRate 0.0416 Epoch: 14 Global Step: 80730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:16,660-Speed 11322.75 samples/sec Loss 6.9936 LearningRate 0.0416 Epoch: 14 Global Step: 80740 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:07:17,560-Speed 11390.52 samples/sec Loss 6.9910 LearningRate 0.0416 Epoch: 14 Global Step: 80750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:18,476-Speed 11179.31 samples/sec Loss 6.8524 LearningRate 0.0416 Epoch: 14 Global Step: 80760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:19,387-Speed 11244.21 samples/sec Loss 6.8803 LearningRate 0.0416 Epoch: 14 Global Step: 80770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:20,292-Speed 11326.21 samples/sec Loss 6.9240 LearningRate 0.0416 Epoch: 14 Global Step: 80780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:21,221-Speed 11033.58 samples/sec Loss 6.9745 LearningRate 0.0416 Epoch: 14 Global Step: 80790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:22,178-Speed 10713.31 samples/sec Loss 6.9198 LearningRate 0.0416 Epoch: 14 Global Step: 80800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:23,084-Speed 11308.37 samples/sec Loss 6.8962 LearningRate 0.0416 Epoch: 14 Global Step: 80810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:24,011-Speed 11058.25 samples/sec Loss 6.8371 LearningRate 0.0416 Epoch: 14 Global Step: 80820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:24,996-Speed 10397.26 samples/sec Loss 7.0347 LearningRate 0.0416 Epoch: 14 Global Step: 80830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:25,946-Speed 10792.44 samples/sec Loss 7.0315 LearningRate 0.0415 Epoch: 14 Global Step: 80840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:26,908-Speed 10645.59 samples/sec Loss 6.9680 LearningRate 0.0415 Epoch: 14 Global Step: 80850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:27,848-Speed 10909.98 samples/sec Loss 6.9321 LearningRate 0.0415 Epoch: 14 Global Step: 80860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:28,808-Speed 10668.97 samples/sec Loss 6.8779 LearningRate 0.0415 Epoch: 14 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:29,741-Speed 10978.12 samples/sec Loss 6.9193 LearningRate 0.0415 Epoch: 14 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:30,698-Speed 10715.48 samples/sec Loss 7.0942 LearningRate 0.0415 Epoch: 14 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:31,613-Speed 11195.06 samples/sec Loss 7.0336 LearningRate 0.0415 Epoch: 14 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:32,530-Speed 11178.05 samples/sec Loss 6.9855 LearningRate 0.0415 Epoch: 14 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:33,442-Speed 11241.87 samples/sec Loss 7.1245 LearningRate 0.0415 Epoch: 14 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:34,350-Speed 11280.75 samples/sec Loss 6.8273 LearningRate 0.0415 Epoch: 14 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:35,247-Speed 11426.89 samples/sec Loss 6.9939 LearningRate 0.0415 Epoch: 14 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:36,146-Speed 11392.62 samples/sec Loss 6.9883 LearningRate 0.0415 Epoch: 14 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:37,109-Speed 10634.47 samples/sec Loss 6.9705 LearningRate 0.0415 Epoch: 14 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:38,032-Speed 11110.64 samples/sec Loss 7.0792 LearningRate 0.0415 Epoch: 14 Global Step: 80970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:38,939-Speed 11297.35 samples/sec Loss 6.9324 LearningRate 0.0415 Epoch: 14 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:39,868-Speed 11035.64 samples/sec Loss 6.8885 LearningRate 0.0415 Epoch: 14 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:40,776-Speed 11284.37 samples/sec Loss 7.0446 LearningRate 0.0415 Epoch: 14 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:41,689-Speed 11220.47 samples/sec Loss 7.0900 LearningRate 0.0415 Epoch: 14 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:42,601-Speed 11233.40 samples/sec Loss 7.0694 LearningRate 0.0414 Epoch: 14 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:43,519-Speed 11168.64 samples/sec Loss 6.9623 LearningRate 0.0414 Epoch: 14 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:44,437-Speed 11155.58 samples/sec Loss 6.9683 LearningRate 0.0414 Epoch: 14 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:45,369-Speed 10999.33 samples/sec Loss 7.0798 LearningRate 0.0414 Epoch: 14 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:46,289-Speed 11143.63 samples/sec Loss 6.8747 LearningRate 0.0414 Epoch: 14 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:47,227-Speed 10924.67 samples/sec Loss 7.0519 LearningRate 0.0414 Epoch: 14 Global Step: 81070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:48,143-Speed 11178.16 samples/sec Loss 6.8617 LearningRate 0.0414 Epoch: 14 Global Step: 81080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:49,067-Speed 11095.01 samples/sec Loss 6.9539 LearningRate 0.0414 Epoch: 14 Global Step: 81090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:49,980-Speed 11215.39 samples/sec Loss 7.0096 LearningRate 0.0414 Epoch: 14 Global Step: 81100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:50,937-Speed 10712.06 samples/sec Loss 6.9888 LearningRate 0.0414 Epoch: 14 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:51,905-Speed 10591.15 samples/sec Loss 7.1053 LearningRate 0.0414 Epoch: 14 Global Step: 81120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:07:52,857-Speed 10764.07 samples/sec Loss 6.9700 LearningRate 0.0414 Epoch: 14 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:53,798-Speed 10883.12 samples/sec Loss 7.0748 LearningRate 0.0414 Epoch: 14 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:54,757-Speed 10688.36 samples/sec Loss 6.9686 LearningRate 0.0414 Epoch: 14 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:55,672-Speed 11198.34 samples/sec Loss 6.9714 LearningRate 0.0414 Epoch: 14 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:56,606-Speed 10974.84 samples/sec Loss 6.8799 LearningRate 0.0414 Epoch: 14 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:57,513-Speed 11288.88 samples/sec Loss 7.0061 LearningRate 0.0414 Epoch: 14 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:58,455-Speed 10879.09 samples/sec Loss 7.0002 LearningRate 0.0413 Epoch: 14 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:07:59,716-Speed 8125.73 samples/sec Loss 7.0622 LearningRate 0.0413 Epoch: 14 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:00,628-Speed 11228.02 samples/sec Loss 6.9883 LearningRate 0.0413 Epoch: 14 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:01,564-Speed 10954.31 samples/sec Loss 6.9537 LearningRate 0.0413 Epoch: 14 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:02,489-Speed 11081.85 samples/sec Loss 7.0134 LearningRate 0.0413 Epoch: 14 Global Step: 81230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:03,388-Speed 11397.31 samples/sec Loss 6.9890 LearningRate 0.0413 Epoch: 14 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:04,368-Speed 10457.64 samples/sec Loss 6.9695 LearningRate 0.0413 Epoch: 14 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:05,299-Speed 11000.44 samples/sec Loss 7.0144 LearningRate 0.0413 Epoch: 14 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:06,235-Speed 10945.32 samples/sec Loss 6.9597 LearningRate 0.0413 Epoch: 14 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:07,168-Speed 10991.01 samples/sec Loss 7.0919 LearningRate 0.0413 Epoch: 14 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:08,073-Speed 11318.15 samples/sec Loss 6.9856 LearningRate 0.0413 Epoch: 14 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:09,007-Speed 10974.08 samples/sec Loss 7.0533 LearningRate 0.0413 Epoch: 14 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:09,907-Speed 11385.96 samples/sec Loss 7.1442 LearningRate 0.0413 Epoch: 14 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:10,844-Speed 10944.11 samples/sec Loss 6.9419 LearningRate 0.0413 Epoch: 14 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:11,765-Speed 11125.18 samples/sec Loss 7.0667 LearningRate 0.0413 Epoch: 14 Global Step: 81330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:12,703-Speed 10913.93 samples/sec Loss 6.9141 LearningRate 0.0413 Epoch: 14 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:13,645-Speed 10877.86 samples/sec Loss 6.9521 LearningRate 0.0413 Epoch: 14 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:14,601-Speed 10726.88 samples/sec Loss 6.9765 LearningRate 0.0413 Epoch: 14 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:15,545-Speed 10850.34 samples/sec Loss 6.8152 LearningRate 0.0412 Epoch: 14 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:16,510-Speed 10620.50 samples/sec Loss 6.8473 LearningRate 0.0412 Epoch: 14 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:17,469-Speed 10681.75 samples/sec Loss 7.0234 LearningRate 0.0412 Epoch: 14 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:18,396-Speed 11051.86 samples/sec Loss 7.0021 LearningRate 0.0412 Epoch: 14 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:19,340-Speed 10865.32 samples/sec Loss 6.8935 LearningRate 0.0412 Epoch: 14 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:20,255-Speed 11199.40 samples/sec Loss 7.0319 LearningRate 0.0412 Epoch: 14 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:21,164-Speed 11270.55 samples/sec Loss 7.1088 LearningRate 0.0412 Epoch: 14 Global Step: 81430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:22,065-Speed 11370.64 samples/sec Loss 7.2051 LearningRate 0.0412 Epoch: 14 Global Step: 81440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:22,977-Speed 11235.73 samples/sec Loss 6.9817 LearningRate 0.0412 Epoch: 14 Global Step: 81450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:23,885-Speed 11286.37 samples/sec Loss 7.0904 LearningRate 0.0412 Epoch: 14 Global Step: 81460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:24,814-Speed 11025.59 samples/sec Loss 6.9546 LearningRate 0.0412 Epoch: 14 Global Step: 81470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:25,717-Speed 11343.48 samples/sec Loss 6.9304 LearningRate 0.0412 Epoch: 14 Global Step: 81480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:26,660-Speed 10868.27 samples/sec Loss 7.0623 LearningRate 0.0412 Epoch: 14 Global Step: 81490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:27,608-Speed 10810.30 samples/sec Loss 7.0899 LearningRate 0.0412 Epoch: 14 Global Step: 81500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:28,546-Speed 10922.56 samples/sec Loss 7.0035 LearningRate 0.0412 Epoch: 14 Global Step: 81510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:29,490-Speed 10854.14 samples/sec Loss 7.0771 LearningRate 0.0412 Epoch: 14 Global Step: 81520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:30,390-Speed 11387.31 samples/sec Loss 7.1445 LearningRate 0.0412 Epoch: 14 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:31,318-Speed 11046.75 samples/sec Loss 6.9569 LearningRate 0.0412 Epoch: 14 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:32,278-Speed 10678.58 samples/sec Loss 6.9015 LearningRate 0.0411 Epoch: 14 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:33,193-Speed 11192.61 samples/sec Loss 7.0345 LearningRate 0.0411 Epoch: 14 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:34,096-Speed 11347.50 samples/sec Loss 7.0464 LearningRate 0.0411 Epoch: 14 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:34,980-Speed 11597.00 samples/sec Loss 7.0454 LearningRate 0.0411 Epoch: 14 Global Step: 81580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:35,920-Speed 10906.55 samples/sec Loss 7.0525 LearningRate 0.0411 Epoch: 14 Global Step: 81590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:36,827-Speed 11299.48 samples/sec Loss 7.0379 LearningRate 0.0411 Epoch: 14 Global Step: 81600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:37,760-Speed 10978.25 samples/sec Loss 6.9758 LearningRate 0.0411 Epoch: 14 Global Step: 81610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:38,702-Speed 10872.22 samples/sec Loss 7.2091 LearningRate 0.0411 Epoch: 14 Global Step: 81620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:39,634-Speed 10997.60 samples/sec Loss 7.1082 LearningRate 0.0411 Epoch: 14 Global Step: 81630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:40,563-Speed 11035.44 samples/sec Loss 7.0151 LearningRate 0.0411 Epoch: 14 Global Step: 81640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:41,477-Speed 11201.97 samples/sec Loss 6.9685 LearningRate 0.0411 Epoch: 14 Global Step: 81650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:42,394-Speed 11181.21 samples/sec Loss 7.1524 LearningRate 0.0411 Epoch: 14 Global Step: 81660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:43,316-Speed 11114.43 samples/sec Loss 7.0631 LearningRate 0.0411 Epoch: 14 Global Step: 81670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:44,248-Speed 10988.92 samples/sec Loss 7.0241 LearningRate 0.0411 Epoch: 14 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:45,160-Speed 11237.32 samples/sec Loss 7.0148 LearningRate 0.0411 Epoch: 14 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:46,090-Speed 11016.15 samples/sec Loss 7.0191 LearningRate 0.0411 Epoch: 14 Global Step: 81700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:47,038-Speed 10816.06 samples/sec Loss 6.8880 LearningRate 0.0411 Epoch: 14 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:47,968-Speed 11024.40 samples/sec Loss 7.2313 LearningRate 0.0410 Epoch: 14 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:48,898-Speed 11015.77 samples/sec Loss 7.0989 LearningRate 0.0410 Epoch: 14 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:49,825-Speed 11057.63 samples/sec Loss 7.0321 LearningRate 0.0410 Epoch: 14 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:50,786-Speed 10656.27 samples/sec Loss 6.9527 LearningRate 0.0410 Epoch: 14 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:51,753-Speed 10600.15 samples/sec Loss 6.9796 LearningRate 0.0410 Epoch: 14 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:52,677-Speed 11095.08 samples/sec Loss 6.9859 LearningRate 0.0410 Epoch: 14 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:53,602-Speed 11069.10 samples/sec Loss 7.0806 LearningRate 0.0410 Epoch: 14 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:54,551-Speed 10803.17 samples/sec Loss 7.0745 LearningRate 0.0410 Epoch: 14 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:55,475-Speed 11089.70 samples/sec Loss 7.1518 LearningRate 0.0410 Epoch: 14 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:56,413-Speed 10931.65 samples/sec Loss 7.1028 LearningRate 0.0410 Epoch: 14 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:08:57,310-Speed 11421.97 samples/sec Loss 7.1999 LearningRate 0.0410 Epoch: 14 Global Step: 81820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:58,210-Speed 11384.18 samples/sec Loss 7.0580 LearningRate 0.0410 Epoch: 14 Global Step: 81830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:08:59,125-Speed 11199.71 samples/sec Loss 7.0017 LearningRate 0.0410 Epoch: 14 Global Step: 81840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:00,038-Speed 11213.49 samples/sec Loss 7.0529 LearningRate 0.0410 Epoch: 14 Global Step: 81850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:01,016-Speed 10486.28 samples/sec Loss 7.0864 LearningRate 0.0410 Epoch: 14 Global Step: 81860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:01,972-Speed 10713.84 samples/sec Loss 6.9292 LearningRate 0.0410 Epoch: 14 Global Step: 81870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:02,886-Speed 11210.72 samples/sec Loss 7.1662 LearningRate 0.0410 Epoch: 14 Global Step: 81880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:03,820-Speed 10971.23 samples/sec Loss 7.1294 LearningRate 0.0410 Epoch: 14 Global Step: 81890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:04,754-Speed 10972.48 samples/sec Loss 7.1539 LearningRate 0.0409 Epoch: 14 Global Step: 81900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:05,725-Speed 10546.82 samples/sec Loss 7.0664 LearningRate 0.0409 Epoch: 14 Global Step: 81910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:06,729-Speed 10207.03 samples/sec Loss 7.0971 LearningRate 0.0409 Epoch: 14 Global Step: 81920 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:09:07,671-Speed 10880.04 samples/sec Loss 7.0476 LearningRate 0.0409 Epoch: 14 Global Step: 81930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:09:08,589-Speed 11166.17 samples/sec Loss 7.1763 LearningRate 0.0409 Epoch: 14 Global Step: 81940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:09,496-Speed 11296.04 samples/sec Loss 7.0364 LearningRate 0.0409 Epoch: 14 Global Step: 81950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:10,397-Speed 11368.59 samples/sec Loss 7.1172 LearningRate 0.0409 Epoch: 14 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:11,286-Speed 11528.55 samples/sec Loss 7.0977 LearningRate 0.0409 Epoch: 14 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:12,191-Speed 11330.16 samples/sec Loss 6.9559 LearningRate 0.0409 Epoch: 14 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:13,152-Speed 10667.95 samples/sec Loss 7.1658 LearningRate 0.0409 Epoch: 14 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:14,066-Speed 11201.36 samples/sec Loss 7.0042 LearningRate 0.0409 Epoch: 14 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:09:35,478-[lfw][82000]XNorm: 12.061905 Training: 2022-04-11 02:09:35,478-[lfw][82000]Accuracy-Flip: 0.99600+-0.00327 Training: 2022-04-11 02:09:35,479-[lfw][82000]Accuracy-Highest: 0.99600 Training: 2022-04-11 02:10:00,049-[cfp_fp][82000]XNorm: 10.108756 Training: 2022-04-11 02:10:00,049-[cfp_fp][82000]Accuracy-Flip: 0.94371+-0.01100 Training: 2022-04-11 02:10:00,050-[cfp_fp][82000]Accuracy-Highest: 0.94371 Training: 2022-04-11 02:10:22,337-[agedb_30][82000]XNorm: 11.772777 Training: 2022-04-11 02:10:22,338-[agedb_30][82000]Accuracy-Flip: 0.95983+-0.00828 Training: 2022-04-11 02:10:22,338-[agedb_30][82000]Accuracy-Highest: 0.96083 Training: 2022-04-11 02:10:23,275-Speed 147.96 samples/sec Loss 7.1066 LearningRate 0.0409 Epoch: 14 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:24,169-Speed 11472.80 samples/sec Loss 7.2425 LearningRate 0.0409 Epoch: 14 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:25,100-Speed 11004.05 samples/sec Loss 7.2278 LearningRate 0.0409 Epoch: 14 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:26,047-Speed 10814.92 samples/sec Loss 7.1668 LearningRate 0.0409 Epoch: 14 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:26,959-Speed 11232.87 samples/sec Loss 7.0974 LearningRate 0.0409 Epoch: 14 Global Step: 82050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:27,882-Speed 11109.12 samples/sec Loss 7.1144 LearningRate 0.0409 Epoch: 14 Global Step: 82060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:28,805-Speed 11096.54 samples/sec Loss 7.0594 LearningRate 0.0409 Epoch: 14 Global Step: 82070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:29,743-Speed 10924.65 samples/sec Loss 6.9947 LearningRate 0.0408 Epoch: 14 Global Step: 82080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:30,697-Speed 10742.27 samples/sec Loss 6.9204 LearningRate 0.0408 Epoch: 14 Global Step: 82090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:31,594-Speed 11431.29 samples/sec Loss 6.9527 LearningRate 0.0408 Epoch: 14 Global Step: 82100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:32,488-Speed 11455.85 samples/sec Loss 7.0827 LearningRate 0.0408 Epoch: 14 Global Step: 82110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:33,376-Speed 11534.80 samples/sec Loss 6.9897 LearningRate 0.0408 Epoch: 14 Global Step: 82120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:34,257-Speed 11639.60 samples/sec Loss 7.1021 LearningRate 0.0408 Epoch: 14 Global Step: 82130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:35,163-Speed 11307.69 samples/sec Loss 7.1657 LearningRate 0.0408 Epoch: 14 Global Step: 82140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:36,114-Speed 10777.17 samples/sec Loss 7.0287 LearningRate 0.0408 Epoch: 14 Global Step: 82150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:37,075-Speed 10662.82 samples/sec Loss 7.2108 LearningRate 0.0408 Epoch: 14 Global Step: 82160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:37,987-Speed 11231.57 samples/sec Loss 7.0496 LearningRate 0.0408 Epoch: 14 Global Step: 82170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:38,960-Speed 10529.83 samples/sec Loss 7.1063 LearningRate 0.0408 Epoch: 14 Global Step: 82180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:39,872-Speed 11238.15 samples/sec Loss 7.1302 LearningRate 0.0408 Epoch: 14 Global Step: 82190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:40,785-Speed 11227.92 samples/sec Loss 6.8989 LearningRate 0.0408 Epoch: 14 Global Step: 82200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:41,717-Speed 10995.53 samples/sec Loss 6.9636 LearningRate 0.0408 Epoch: 14 Global Step: 82210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:42,658-Speed 10883.55 samples/sec Loss 6.9871 LearningRate 0.0408 Epoch: 14 Global Step: 82220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:43,574-Speed 11192.09 samples/sec Loss 7.0447 LearningRate 0.0408 Epoch: 14 Global Step: 82230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:44,521-Speed 10822.18 samples/sec Loss 7.0181 LearningRate 0.0408 Epoch: 14 Global Step: 82240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:45,407-Speed 11556.14 samples/sec Loss 7.1293 LearningRate 0.0408 Epoch: 14 Global Step: 82250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:46,301-Speed 11463.89 samples/sec Loss 7.1183 LearningRate 0.0407 Epoch: 14 Global Step: 82260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:47,206-Speed 11333.03 samples/sec Loss 7.0340 LearningRate 0.0407 Epoch: 14 Global Step: 82270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:48,133-Speed 11048.49 samples/sec Loss 7.1021 LearningRate 0.0407 Epoch: 14 Global Step: 82280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:49,039-Speed 11307.96 samples/sec Loss 7.2669 LearningRate 0.0407 Epoch: 14 Global Step: 82290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:49,941-Speed 11361.78 samples/sec Loss 7.1504 LearningRate 0.0407 Epoch: 14 Global Step: 82300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:50,855-Speed 11216.68 samples/sec Loss 7.1860 LearningRate 0.0407 Epoch: 14 Global Step: 82310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:51,770-Speed 11196.23 samples/sec Loss 6.9887 LearningRate 0.0407 Epoch: 14 Global Step: 82320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:52,683-Speed 11232.56 samples/sec Loss 7.1904 LearningRate 0.0407 Epoch: 14 Global Step: 82330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:53,620-Speed 10935.80 samples/sec Loss 7.1546 LearningRate 0.0407 Epoch: 14 Global Step: 82340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:10:54,555-Speed 10969.05 samples/sec Loss 7.1164 LearningRate 0.0407 Epoch: 14 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:55,517-Speed 10646.78 samples/sec Loss 6.9976 LearningRate 0.0407 Epoch: 14 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:56,418-Speed 11380.08 samples/sec Loss 7.1048 LearningRate 0.0407 Epoch: 14 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:57,352-Speed 10964.76 samples/sec Loss 7.1312 LearningRate 0.0407 Epoch: 14 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:58,301-Speed 10800.36 samples/sec Loss 6.9687 LearningRate 0.0407 Epoch: 14 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:10:59,221-Speed 11142.78 samples/sec Loss 7.0359 LearningRate 0.0407 Epoch: 14 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:00,149-Speed 11037.13 samples/sec Loss 7.0324 LearningRate 0.0407 Epoch: 14 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:01,091-Speed 10882.87 samples/sec Loss 7.1419 LearningRate 0.0407 Epoch: 14 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:02,001-Speed 11251.74 samples/sec Loss 7.0561 LearningRate 0.0407 Epoch: 14 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:02,993-Speed 10333.95 samples/sec Loss 7.1085 LearningRate 0.0406 Epoch: 14 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:03,894-Speed 11368.46 samples/sec Loss 6.9849 LearningRate 0.0406 Epoch: 14 Global Step: 82450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:04,800-Speed 11313.34 samples/sec Loss 7.0194 LearningRate 0.0406 Epoch: 14 Global Step: 82460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:05,754-Speed 10749.73 samples/sec Loss 6.9878 LearningRate 0.0406 Epoch: 14 Global Step: 82470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:06,661-Speed 11299.46 samples/sec Loss 7.0285 LearningRate 0.0406 Epoch: 14 Global Step: 82480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:07,572-Speed 11254.53 samples/sec Loss 7.1572 LearningRate 0.0406 Epoch: 14 Global Step: 82490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:08,494-Speed 11113.34 samples/sec Loss 7.0221 LearningRate 0.0406 Epoch: 14 Global Step: 82500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:09,396-Speed 11359.95 samples/sec Loss 7.1673 LearningRate 0.0406 Epoch: 14 Global Step: 82510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:10,303-Speed 11295.32 samples/sec Loss 7.1834 LearningRate 0.0406 Epoch: 14 Global Step: 82520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:11,220-Speed 11174.73 samples/sec Loss 7.0825 LearningRate 0.0406 Epoch: 14 Global Step: 82530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:12,132-Speed 11240.09 samples/sec Loss 7.1508 LearningRate 0.0406 Epoch: 14 Global Step: 82540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:13,019-Speed 11552.63 samples/sec Loss 7.0902 LearningRate 0.0406 Epoch: 14 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:13,911-Speed 11491.91 samples/sec Loss 7.0327 LearningRate 0.0406 Epoch: 14 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:14,799-Speed 11540.68 samples/sec Loss 7.1731 LearningRate 0.0406 Epoch: 14 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:15,720-Speed 11118.27 samples/sec Loss 6.9468 LearningRate 0.0406 Epoch: 14 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:16,594-Speed 11726.12 samples/sec Loss 7.1705 LearningRate 0.0406 Epoch: 14 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:17,490-Speed 11444.33 samples/sec Loss 7.1206 LearningRate 0.0406 Epoch: 14 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:18,390-Speed 11376.53 samples/sec Loss 7.1041 LearningRate 0.0405 Epoch: 14 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:19,348-Speed 10704.66 samples/sec Loss 7.0992 LearningRate 0.0405 Epoch: 14 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:20,279-Speed 10996.77 samples/sec Loss 7.0959 LearningRate 0.0405 Epoch: 14 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:21,162-Speed 11609.67 samples/sec Loss 7.0779 LearningRate 0.0405 Epoch: 14 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:22,060-Speed 11408.81 samples/sec Loss 7.1590 LearningRate 0.0405 Epoch: 14 Global Step: 82650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:22,961-Speed 11375.74 samples/sec Loss 6.9885 LearningRate 0.0405 Epoch: 14 Global Step: 82660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:23,862-Speed 11372.27 samples/sec Loss 6.9660 LearningRate 0.0405 Epoch: 14 Global Step: 82670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:24,787-Speed 11076.50 samples/sec Loss 7.0777 LearningRate 0.0405 Epoch: 14 Global Step: 82680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:25,720-Speed 10981.25 samples/sec Loss 7.0634 LearningRate 0.0405 Epoch: 14 Global Step: 82690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:26,653-Speed 10985.71 samples/sec Loss 6.9715 LearningRate 0.0405 Epoch: 14 Global Step: 82700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:27,600-Speed 10827.01 samples/sec Loss 6.9766 LearningRate 0.0405 Epoch: 14 Global Step: 82710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:28,526-Speed 11060.98 samples/sec Loss 7.0508 LearningRate 0.0405 Epoch: 14 Global Step: 82720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:29,494-Speed 10588.78 samples/sec Loss 7.1344 LearningRate 0.0405 Epoch: 14 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:30,392-Speed 11401.44 samples/sec Loss 7.2156 LearningRate 0.0405 Epoch: 14 Global Step: 82740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:31,301-Speed 11281.91 samples/sec Loss 7.1149 LearningRate 0.0405 Epoch: 14 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:32,202-Speed 11372.51 samples/sec Loss 7.0655 LearningRate 0.0405 Epoch: 14 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:33,101-Speed 11401.28 samples/sec Loss 7.1525 LearningRate 0.0405 Epoch: 14 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:33,992-Speed 11503.46 samples/sec Loss 7.0769 LearningRate 0.0405 Epoch: 14 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:34,888-Speed 11433.62 samples/sec Loss 7.1003 LearningRate 0.0404 Epoch: 14 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:35,787-Speed 11390.51 samples/sec Loss 7.0442 LearningRate 0.0404 Epoch: 14 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:36,709-Speed 11115.75 samples/sec Loss 7.1182 LearningRate 0.0404 Epoch: 14 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:37,683-Speed 10522.32 samples/sec Loss 7.1354 LearningRate 0.0404 Epoch: 14 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:38,616-Speed 10981.73 samples/sec Loss 7.3599 LearningRate 0.0404 Epoch: 14 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:39,539-Speed 11109.22 samples/sec Loss 7.1680 LearningRate 0.0404 Epoch: 14 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:40,473-Speed 10967.19 samples/sec Loss 7.0546 LearningRate 0.0404 Epoch: 14 Global Step: 82850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:41,401-Speed 11045.23 samples/sec Loss 7.0123 LearningRate 0.0404 Epoch: 14 Global Step: 82860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:42,296-Speed 11442.93 samples/sec Loss 7.0833 LearningRate 0.0404 Epoch: 14 Global Step: 82870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:43,262-Speed 10606.60 samples/sec Loss 7.0467 LearningRate 0.0404 Epoch: 14 Global Step: 82880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:44,176-Speed 11216.85 samples/sec Loss 7.0122 LearningRate 0.0404 Epoch: 14 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:45,064-Speed 11544.66 samples/sec Loss 7.0836 LearningRate 0.0404 Epoch: 14 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:45,950-Speed 11554.65 samples/sec Loss 7.1705 LearningRate 0.0404 Epoch: 14 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:46,845-Speed 11455.06 samples/sec Loss 7.1398 LearningRate 0.0404 Epoch: 14 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:47,733-Speed 11543.11 samples/sec Loss 7.0987 LearningRate 0.0404 Epoch: 14 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:48,657-Speed 11081.88 samples/sec Loss 7.0164 LearningRate 0.0404 Epoch: 14 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:49,586-Speed 11033.49 samples/sec Loss 7.1053 LearningRate 0.0404 Epoch: 14 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:50,512-Speed 11064.31 samples/sec Loss 7.1814 LearningRate 0.0404 Epoch: 14 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:51,456-Speed 10851.93 samples/sec Loss 6.9574 LearningRate 0.0403 Epoch: 14 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:52,371-Speed 11202.95 samples/sec Loss 6.9899 LearningRate 0.0403 Epoch: 14 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:53,266-Speed 11455.02 samples/sec Loss 7.0262 LearningRate 0.0403 Epoch: 14 Global Step: 82990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:54,190-Speed 11081.20 samples/sec Loss 6.9805 LearningRate 0.0403 Epoch: 14 Global Step: 83000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:55,098-Speed 11291.10 samples/sec Loss 7.0009 LearningRate 0.0403 Epoch: 14 Global Step: 83010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:11:56,001-Speed 11347.12 samples/sec Loss 7.0421 LearningRate 0.0403 Epoch: 14 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:56,893-Speed 11489.87 samples/sec Loss 7.0556 LearningRate 0.0403 Epoch: 14 Global Step: 83030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:57,773-Speed 11649.13 samples/sec Loss 7.2185 LearningRate 0.0403 Epoch: 14 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:58,666-Speed 11464.59 samples/sec Loss 7.1491 LearningRate 0.0403 Epoch: 14 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:11:59,621-Speed 10731.42 samples/sec Loss 7.0685 LearningRate 0.0403 Epoch: 14 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:00,534-Speed 11222.76 samples/sec Loss 6.9838 LearningRate 0.0403 Epoch: 14 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:01,515-Speed 10450.16 samples/sec Loss 7.1473 LearningRate 0.0403 Epoch: 14 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:02,434-Speed 11145.63 samples/sec Loss 7.0015 LearningRate 0.0403 Epoch: 14 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:03,338-Speed 11334.31 samples/sec Loss 7.1168 LearningRate 0.0403 Epoch: 14 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:04,245-Speed 11297.77 samples/sec Loss 7.0711 LearningRate 0.0403 Epoch: 14 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:05,191-Speed 10834.13 samples/sec Loss 7.0956 LearningRate 0.0403 Epoch: 14 Global Step: 83120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:06,112-Speed 11121.72 samples/sec Loss 7.1246 LearningRate 0.0403 Epoch: 14 Global Step: 83130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:07,003-Speed 11500.20 samples/sec Loss 7.0173 LearningRate 0.0403 Epoch: 14 Global Step: 83140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:07,915-Speed 11237.88 samples/sec Loss 7.1672 LearningRate 0.0402 Epoch: 14 Global Step: 83150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:08,795-Speed 11651.28 samples/sec Loss 7.1180 LearningRate 0.0402 Epoch: 14 Global Step: 83160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:09,683-Speed 11534.93 samples/sec Loss 7.1084 LearningRate 0.0402 Epoch: 14 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:10,576-Speed 11477.54 samples/sec Loss 6.9826 LearningRate 0.0402 Epoch: 14 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:11,496-Speed 11136.79 samples/sec Loss 7.0844 LearningRate 0.0402 Epoch: 14 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:12,447-Speed 10770.55 samples/sec Loss 7.1362 LearningRate 0.0402 Epoch: 14 Global Step: 83200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:13,355-Speed 11287.73 samples/sec Loss 7.0241 LearningRate 0.0402 Epoch: 14 Global Step: 83210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:14,258-Speed 11350.64 samples/sec Loss 7.1527 LearningRate 0.0402 Epoch: 14 Global Step: 83220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:15,175-Speed 11174.34 samples/sec Loss 7.1066 LearningRate 0.0402 Epoch: 14 Global Step: 83230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:16,120-Speed 10843.06 samples/sec Loss 7.0702 LearningRate 0.0402 Epoch: 14 Global Step: 83240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:17,061-Speed 10893.40 samples/sec Loss 7.1372 LearningRate 0.0402 Epoch: 14 Global Step: 83250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:18,016-Speed 10731.02 samples/sec Loss 7.0036 LearningRate 0.0402 Epoch: 14 Global Step: 83260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:18,931-Speed 11196.53 samples/sec Loss 7.1138 LearningRate 0.0402 Epoch: 14 Global Step: 83270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:19,819-Speed 11532.56 samples/sec Loss 6.9736 LearningRate 0.0402 Epoch: 14 Global Step: 83280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:20,709-Speed 11520.71 samples/sec Loss 7.0742 LearningRate 0.0402 Epoch: 14 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:21,611-Speed 11359.31 samples/sec Loss 7.0346 LearningRate 0.0402 Epoch: 14 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:22,501-Speed 11520.07 samples/sec Loss 7.1617 LearningRate 0.0402 Epoch: 14 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:23,396-Speed 11452.02 samples/sec Loss 7.1102 LearningRate 0.0402 Epoch: 14 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:24,333-Speed 10937.71 samples/sec Loss 7.1204 LearningRate 0.0401 Epoch: 14 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:25,266-Speed 10981.14 samples/sec Loss 7.0516 LearningRate 0.0401 Epoch: 14 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:26,187-Speed 11119.02 samples/sec Loss 7.1042 LearningRate 0.0401 Epoch: 14 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:27,116-Speed 11033.68 samples/sec Loss 6.9669 LearningRate 0.0401 Epoch: 14 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:28,032-Speed 11205.67 samples/sec Loss 7.0480 LearningRate 0.0401 Epoch: 14 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:28,966-Speed 10962.54 samples/sec Loss 7.0858 LearningRate 0.0401 Epoch: 14 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:29,926-Speed 10683.24 samples/sec Loss 7.1318 LearningRate 0.0401 Epoch: 14 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:12:30,878-Speed 10759.20 samples/sec Loss 7.0500 LearningRate 0.0401 Epoch: 14 Global Step: 83400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:31,828-Speed 10780.13 samples/sec Loss 7.1151 LearningRate 0.0401 Epoch: 14 Global Step: 83410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:32,749-Speed 11125.73 samples/sec Loss 6.9367 LearningRate 0.0401 Epoch: 14 Global Step: 83420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:33,646-Speed 11420.49 samples/sec Loss 7.0469 LearningRate 0.0401 Epoch: 14 Global Step: 83430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:34,534-Speed 11547.56 samples/sec Loss 7.0635 LearningRate 0.0401 Epoch: 14 Global Step: 83440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:35,443-Speed 11269.06 samples/sec Loss 6.9596 LearningRate 0.0401 Epoch: 14 Global Step: 83450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:36,363-Speed 11144.81 samples/sec Loss 7.1877 LearningRate 0.0401 Epoch: 14 Global Step: 83460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:37,269-Speed 11306.43 samples/sec Loss 7.1411 LearningRate 0.0401 Epoch: 14 Global Step: 83470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:38,212-Speed 10873.65 samples/sec Loss 6.8006 LearningRate 0.0401 Epoch: 14 Global Step: 83480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:39,124-Speed 11233.33 samples/sec Loss 7.0046 LearningRate 0.0401 Epoch: 14 Global Step: 83490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:40,092-Speed 10583.23 samples/sec Loss 6.9634 LearningRate 0.0401 Epoch: 14 Global Step: 83500 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:12:41,033-Speed 10892.04 samples/sec Loss 7.0279 LearningRate 0.0400 Epoch: 14 Global Step: 83510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:41,953-Speed 11132.30 samples/sec Loss 7.1310 LearningRate 0.0400 Epoch: 14 Global Step: 83520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:42,902-Speed 10802.95 samples/sec Loss 7.1128 LearningRate 0.0400 Epoch: 14 Global Step: 83530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:43,875-Speed 10525.44 samples/sec Loss 7.2465 LearningRate 0.0400 Epoch: 14 Global Step: 83540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:44,791-Speed 11194.18 samples/sec Loss 7.2094 LearningRate 0.0400 Epoch: 14 Global Step: 83550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:45,692-Speed 11368.31 samples/sec Loss 7.0769 LearningRate 0.0400 Epoch: 14 Global Step: 83560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:46,595-Speed 11343.51 samples/sec Loss 7.1144 LearningRate 0.0400 Epoch: 14 Global Step: 83570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:47,518-Speed 11108.74 samples/sec Loss 7.0269 LearningRate 0.0400 Epoch: 14 Global Step: 83580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:48,443-Speed 11074.31 samples/sec Loss 6.9968 LearningRate 0.0400 Epoch: 14 Global Step: 83590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:49,372-Speed 11030.94 samples/sec Loss 7.0880 LearningRate 0.0400 Epoch: 14 Global Step: 83600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:50,320-Speed 10802.60 samples/sec Loss 7.1324 LearningRate 0.0400 Epoch: 14 Global Step: 83610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:51,263-Speed 10875.09 samples/sec Loss 7.0981 LearningRate 0.0400 Epoch: 14 Global Step: 83620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:52,161-Speed 11408.75 samples/sec Loss 7.0112 LearningRate 0.0400 Epoch: 14 Global Step: 83630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:53,088-Speed 11050.89 samples/sec Loss 7.1141 LearningRate 0.0400 Epoch: 14 Global Step: 83640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:54,040-Speed 10763.58 samples/sec Loss 6.9475 LearningRate 0.0400 Epoch: 14 Global Step: 83650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:54,973-Speed 10988.65 samples/sec Loss 7.0263 LearningRate 0.0400 Epoch: 14 Global Step: 83660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:55,894-Speed 11116.20 samples/sec Loss 6.9914 LearningRate 0.0400 Epoch: 14 Global Step: 83670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:56,792-Speed 11421.28 samples/sec Loss 7.2976 LearningRate 0.0400 Epoch: 14 Global Step: 83680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:57,689-Speed 11422.48 samples/sec Loss 7.1608 LearningRate 0.0399 Epoch: 14 Global Step: 83690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:58,599-Speed 11255.03 samples/sec Loss 6.8957 LearningRate 0.0399 Epoch: 14 Global Step: 83700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:12:59,574-Speed 10517.47 samples/sec Loss 7.1312 LearningRate 0.0399 Epoch: 14 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:00,470-Speed 11433.67 samples/sec Loss 7.0930 LearningRate 0.0399 Epoch: 14 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:01,396-Speed 11064.90 samples/sec Loss 7.0526 LearningRate 0.0399 Epoch: 14 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:02,331-Speed 10961.68 samples/sec Loss 6.9974 LearningRate 0.0399 Epoch: 14 Global Step: 83740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:03,267-Speed 10951.86 samples/sec Loss 7.0425 LearningRate 0.0399 Epoch: 14 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:04,206-Speed 10910.10 samples/sec Loss 7.0810 LearningRate 0.0399 Epoch: 14 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:05,185-Speed 10466.14 samples/sec Loss 7.0407 LearningRate 0.0399 Epoch: 14 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:06,122-Speed 10935.96 samples/sec Loss 6.8428 LearningRate 0.0399 Epoch: 14 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:07,058-Speed 10951.49 samples/sec Loss 6.9860 LearningRate 0.0399 Epoch: 14 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:08,003-Speed 10838.48 samples/sec Loss 6.9783 LearningRate 0.0399 Epoch: 14 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:08,912-Speed 11274.61 samples/sec Loss 7.0879 LearningRate 0.0399 Epoch: 14 Global Step: 83810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:09,839-Speed 11055.74 samples/sec Loss 7.0615 LearningRate 0.0399 Epoch: 14 Global Step: 83820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:10,742-Speed 11350.41 samples/sec Loss 7.1360 LearningRate 0.0399 Epoch: 14 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:11,638-Speed 11434.39 samples/sec Loss 6.9872 LearningRate 0.0399 Epoch: 14 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:12,601-Speed 10640.07 samples/sec Loss 7.0146 LearningRate 0.0399 Epoch: 14 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:13,513-Speed 11233.51 samples/sec Loss 7.0136 LearningRate 0.0399 Epoch: 14 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:14,487-Speed 10519.42 samples/sec Loss 7.0411 LearningRate 0.0398 Epoch: 14 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:15,414-Speed 11057.42 samples/sec Loss 7.1281 LearningRate 0.0398 Epoch: 14 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:16,336-Speed 11108.12 samples/sec Loss 6.9725 LearningRate 0.0398 Epoch: 14 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:17,272-Speed 10948.58 samples/sec Loss 6.9624 LearningRate 0.0398 Epoch: 14 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:18,223-Speed 10779.94 samples/sec Loss 7.0788 LearningRate 0.0398 Epoch: 14 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:19,162-Speed 10911.76 samples/sec Loss 7.3346 LearningRate 0.0398 Epoch: 14 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:13:20,077-Speed 11191.37 samples/sec Loss 7.0973 LearningRate 0.0398 Epoch: 14 Global Step: 83930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:20,971-Speed 11473.33 samples/sec Loss 7.1369 LearningRate 0.0398 Epoch: 14 Global Step: 83940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:21,861-Speed 11507.34 samples/sec Loss 7.0562 LearningRate 0.0398 Epoch: 14 Global Step: 83950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:22,775-Speed 11216.42 samples/sec Loss 7.1759 LearningRate 0.0398 Epoch: 14 Global Step: 83960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:23,690-Speed 11198.10 samples/sec Loss 6.9420 LearningRate 0.0398 Epoch: 14 Global Step: 83970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:24,619-Speed 11031.63 samples/sec Loss 6.8922 LearningRate 0.0398 Epoch: 14 Global Step: 83980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:25,573-Speed 10740.64 samples/sec Loss 7.2177 LearningRate 0.0398 Epoch: 14 Global Step: 83990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:26,493-Speed 11136.61 samples/sec Loss 7.0064 LearningRate 0.0398 Epoch: 14 Global Step: 84000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:13:47,884-[lfw][84000]XNorm: 11.820882 Training: 2022-04-11 02:13:47,884-[lfw][84000]Accuracy-Flip: 0.99617+-0.00350 Training: 2022-04-11 02:13:47,885-[lfw][84000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:14:12,651-[cfp_fp][84000]XNorm: 9.967778 Training: 2022-04-11 02:14:12,652-[cfp_fp][84000]Accuracy-Flip: 0.94057+-0.01331 Training: 2022-04-11 02:14:12,652-[cfp_fp][84000]Accuracy-Highest: 0.94371 Training: 2022-04-11 02:14:33,953-[agedb_30][84000]XNorm: 11.568149 Training: 2022-04-11 02:14:33,953-[agedb_30][84000]Accuracy-Flip: 0.95900+-0.00901 Training: 2022-04-11 02:14:33,954-[agedb_30][84000]Accuracy-Highest: 0.96083 Training: 2022-04-11 02:14:34,864-Speed 149.77 samples/sec Loss 6.8661 LearningRate 0.0398 Epoch: 14 Global Step: 84010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:35,753-Speed 11528.62 samples/sec Loss 6.9989 LearningRate 0.0398 Epoch: 14 Global Step: 84020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:36,694-Speed 10893.11 samples/sec Loss 7.0789 LearningRate 0.0398 Epoch: 14 Global Step: 84030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:37,614-Speed 11135.83 samples/sec Loss 7.1917 LearningRate 0.0398 Epoch: 14 Global Step: 84040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:38,598-Speed 10411.27 samples/sec Loss 7.1146 LearningRate 0.0397 Epoch: 14 Global Step: 84050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:39,556-Speed 10696.68 samples/sec Loss 7.0968 LearningRate 0.0397 Epoch: 14 Global Step: 84060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:40,477-Speed 11120.51 samples/sec Loss 6.9830 LearningRate 0.0397 Epoch: 14 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:41,387-Speed 11269.04 samples/sec Loss 7.2193 LearningRate 0.0397 Epoch: 14 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:42,346-Speed 10683.29 samples/sec Loss 7.0667 LearningRate 0.0397 Epoch: 14 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:43,261-Speed 11193.20 samples/sec Loss 7.0828 LearningRate 0.0397 Epoch: 14 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:44,181-Speed 11149.03 samples/sec Loss 7.1048 LearningRate 0.0397 Epoch: 14 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:45,077-Speed 11427.22 samples/sec Loss 6.9909 LearningRate 0.0397 Epoch: 14 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:45,957-Speed 11646.80 samples/sec Loss 7.2161 LearningRate 0.0397 Epoch: 14 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:14:46,869-Speed 11233.95 samples/sec Loss 7.1431 LearningRate 0.0397 Epoch: 14 Global Step: 84140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:47,772-Speed 11342.82 samples/sec Loss 7.0385 LearningRate 0.0397 Epoch: 14 Global Step: 84150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:48,695-Speed 11107.87 samples/sec Loss 7.2769 LearningRate 0.0397 Epoch: 14 Global Step: 84160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:49,647-Speed 10757.14 samples/sec Loss 7.0725 LearningRate 0.0397 Epoch: 14 Global Step: 84170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:50,527-Speed 11652.86 samples/sec Loss 7.1349 LearningRate 0.0397 Epoch: 14 Global Step: 84180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:51,426-Speed 11398.78 samples/sec Loss 7.0100 LearningRate 0.0397 Epoch: 14 Global Step: 84190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:52,339-Speed 11227.82 samples/sec Loss 6.9994 LearningRate 0.0397 Epoch: 14 Global Step: 84200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:53,269-Speed 11014.25 samples/sec Loss 6.9939 LearningRate 0.0397 Epoch: 14 Global Step: 84210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:54,212-Speed 10857.99 samples/sec Loss 7.0238 LearningRate 0.0397 Epoch: 14 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:55,152-Speed 10909.68 samples/sec Loss 7.1297 LearningRate 0.0396 Epoch: 14 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:56,128-Speed 10496.33 samples/sec Loss 7.1383 LearningRate 0.0396 Epoch: 14 Global Step: 84240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:57,017-Speed 11526.49 samples/sec Loss 7.1176 LearningRate 0.0396 Epoch: 14 Global Step: 84250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:57,916-Speed 11410.92 samples/sec Loss 6.9973 LearningRate 0.0396 Epoch: 14 Global Step: 84260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:58,867-Speed 10775.81 samples/sec Loss 6.9503 LearningRate 0.0396 Epoch: 14 Global Step: 84270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:14:59,797-Speed 11018.35 samples/sec Loss 7.0438 LearningRate 0.0396 Epoch: 14 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:00,714-Speed 11167.20 samples/sec Loss 7.0861 LearningRate 0.0396 Epoch: 14 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:01,683-Speed 10577.04 samples/sec Loss 7.0070 LearningRate 0.0396 Epoch: 14 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:02,623-Speed 10903.94 samples/sec Loss 7.1163 LearningRate 0.0396 Epoch: 14 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:03,580-Speed 10708.32 samples/sec Loss 7.0346 LearningRate 0.0396 Epoch: 14 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:04,508-Speed 11038.91 samples/sec Loss 7.0277 LearningRate 0.0396 Epoch: 14 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:05,415-Speed 11306.82 samples/sec Loss 7.1065 LearningRate 0.0396 Epoch: 14 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:06,334-Speed 11151.16 samples/sec Loss 7.0657 LearningRate 0.0396 Epoch: 14 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:07,249-Speed 11195.91 samples/sec Loss 6.9446 LearningRate 0.0396 Epoch: 14 Global Step: 84360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:08,180-Speed 10999.95 samples/sec Loss 6.9424 LearningRate 0.0396 Epoch: 14 Global Step: 84370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:09,077-Speed 11426.46 samples/sec Loss 7.1036 LearningRate 0.0396 Epoch: 14 Global Step: 84380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:09,961-Speed 11590.50 samples/sec Loss 7.1555 LearningRate 0.0396 Epoch: 14 Global Step: 84390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:10,852-Speed 11509.51 samples/sec Loss 7.0232 LearningRate 0.0396 Epoch: 14 Global Step: 84400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:11,750-Speed 11413.78 samples/sec Loss 7.1463 LearningRate 0.0395 Epoch: 14 Global Step: 84410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:12,674-Speed 11084.21 samples/sec Loss 7.0709 LearningRate 0.0395 Epoch: 14 Global Step: 84420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:13,617-Speed 10869.58 samples/sec Loss 7.0237 LearningRate 0.0395 Epoch: 14 Global Step: 84430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:14,565-Speed 10805.69 samples/sec Loss 7.1586 LearningRate 0.0395 Epoch: 14 Global Step: 84440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:15,524-Speed 10685.03 samples/sec Loss 7.1377 LearningRate 0.0395 Epoch: 14 Global Step: 84450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:16,469-Speed 10839.76 samples/sec Loss 7.1564 LearningRate 0.0395 Epoch: 14 Global Step: 84460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:17,414-Speed 10844.74 samples/sec Loss 7.1110 LearningRate 0.0395 Epoch: 14 Global Step: 84470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:18,375-Speed 10661.61 samples/sec Loss 7.0885 LearningRate 0.0395 Epoch: 14 Global Step: 84480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:19,331-Speed 10723.95 samples/sec Loss 7.0514 LearningRate 0.0395 Epoch: 14 Global Step: 84490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:20,261-Speed 11020.49 samples/sec Loss 7.0151 LearningRate 0.0395 Epoch: 14 Global Step: 84500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:21,154-Speed 11466.57 samples/sec Loss 7.1599 LearningRate 0.0395 Epoch: 14 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:22,032-Speed 11668.01 samples/sec Loss 7.1623 LearningRate 0.0395 Epoch: 14 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:22,944-Speed 11238.20 samples/sec Loss 6.9955 LearningRate 0.0395 Epoch: 14 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:23,910-Speed 10606.61 samples/sec Loss 7.0011 LearningRate 0.0395 Epoch: 14 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:24,831-Speed 11133.97 samples/sec Loss 6.9090 LearningRate 0.0395 Epoch: 14 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:25,764-Speed 10973.79 samples/sec Loss 7.1075 LearningRate 0.0395 Epoch: 14 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:26,659-Speed 11454.52 samples/sec Loss 7.0498 LearningRate 0.0395 Epoch: 14 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:27,585-Speed 11072.33 samples/sec Loss 7.0281 LearningRate 0.0395 Epoch: 14 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:28,518-Speed 10979.66 samples/sec Loss 7.0884 LearningRate 0.0394 Epoch: 14 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:29,467-Speed 10796.70 samples/sec Loss 7.0568 LearningRate 0.0394 Epoch: 14 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:30,357-Speed 11516.43 samples/sec Loss 7.0113 LearningRate 0.0394 Epoch: 14 Global Step: 84610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:31,236-Speed 11662.71 samples/sec Loss 7.0109 LearningRate 0.0394 Epoch: 14 Global Step: 84620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:32,161-Speed 11069.99 samples/sec Loss 7.1502 LearningRate 0.0394 Epoch: 14 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:33,055-Speed 11466.49 samples/sec Loss 7.0740 LearningRate 0.0394 Epoch: 14 Global Step: 84640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:33,940-Speed 11574.77 samples/sec Loss 7.2160 LearningRate 0.0394 Epoch: 14 Global Step: 84650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:34,851-Speed 11255.10 samples/sec Loss 7.0688 LearningRate 0.0394 Epoch: 14 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:35,747-Speed 11435.64 samples/sec Loss 7.1298 LearningRate 0.0394 Epoch: 14 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:36,682-Speed 10958.00 samples/sec Loss 7.0570 LearningRate 0.0394 Epoch: 14 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:37,605-Speed 11099.26 samples/sec Loss 6.9480 LearningRate 0.0394 Epoch: 14 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:38,511-Speed 11313.80 samples/sec Loss 7.0606 LearningRate 0.0394 Epoch: 14 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:39,409-Speed 11402.82 samples/sec Loss 6.9856 LearningRate 0.0394 Epoch: 14 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:40,320-Speed 11254.39 samples/sec Loss 6.8517 LearningRate 0.0394 Epoch: 14 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:41,220-Speed 11383.03 samples/sec Loss 6.9308 LearningRate 0.0394 Epoch: 14 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:42,144-Speed 11091.07 samples/sec Loss 7.0283 LearningRate 0.0394 Epoch: 14 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:43,103-Speed 10686.36 samples/sec Loss 7.1546 LearningRate 0.0394 Epoch: 14 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:44,035-Speed 10993.28 samples/sec Loss 6.9884 LearningRate 0.0394 Epoch: 14 Global Step: 84760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:44,930-Speed 11449.70 samples/sec Loss 6.9829 LearningRate 0.0393 Epoch: 14 Global Step: 84770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:45,832-Speed 11357.04 samples/sec Loss 7.1718 LearningRate 0.0393 Epoch: 14 Global Step: 84780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:46,739-Speed 11301.94 samples/sec Loss 7.0747 LearningRate 0.0393 Epoch: 14 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:47,622-Speed 11606.09 samples/sec Loss 7.0361 LearningRate 0.0393 Epoch: 14 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:48,566-Speed 10852.37 samples/sec Loss 7.0404 LearningRate 0.0393 Epoch: 14 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:49,452-Speed 11565.33 samples/sec Loss 7.0564 LearningRate 0.0393 Epoch: 14 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:50,417-Speed 10618.90 samples/sec Loss 7.1320 LearningRate 0.0393 Epoch: 14 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:51,316-Speed 11401.52 samples/sec Loss 7.0820 LearningRate 0.0393 Epoch: 14 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:52,275-Speed 10687.29 samples/sec Loss 7.0862 LearningRate 0.0393 Epoch: 14 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:53,227-Speed 10759.37 samples/sec Loss 7.0664 LearningRate 0.0393 Epoch: 14 Global Step: 84860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:54,137-Speed 11260.26 samples/sec Loss 7.2044 LearningRate 0.0393 Epoch: 14 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:55,062-Speed 11083.10 samples/sec Loss 7.1089 LearningRate 0.0393 Epoch: 14 Global Step: 84880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:15:55,998-Speed 10949.21 samples/sec Loss 7.1560 LearningRate 0.0393 Epoch: 14 Global Step: 84890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:56,892-Speed 11455.09 samples/sec Loss 7.0724 LearningRate 0.0393 Epoch: 14 Global Step: 84900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:57,787-Speed 11459.35 samples/sec Loss 7.0624 LearningRate 0.0393 Epoch: 14 Global Step: 84910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:58,691-Speed 11328.54 samples/sec Loss 7.1475 LearningRate 0.0393 Epoch: 14 Global Step: 84920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:15:59,599-Speed 11292.93 samples/sec Loss 7.1507 LearningRate 0.0393 Epoch: 14 Global Step: 84930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:00,526-Speed 11051.13 samples/sec Loss 7.1685 LearningRate 0.0393 Epoch: 14 Global Step: 84940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:01,483-Speed 10700.02 samples/sec Loss 7.0596 LearningRate 0.0392 Epoch: 14 Global Step: 84950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:02,407-Speed 11091.30 samples/sec Loss 7.0825 LearningRate 0.0392 Epoch: 14 Global Step: 84960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:03,343-Speed 10952.77 samples/sec Loss 7.0800 LearningRate 0.0392 Epoch: 14 Global Step: 84970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:04,257-Speed 11212.43 samples/sec Loss 7.1525 LearningRate 0.0392 Epoch: 14 Global Step: 84980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:05,207-Speed 10778.86 samples/sec Loss 7.0683 LearningRate 0.0392 Epoch: 14 Global Step: 84990 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:16:06,109-Speed 11361.27 samples/sec Loss 6.9323 LearningRate 0.0392 Epoch: 14 Global Step: 85000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:07,018-Speed 11267.89 samples/sec Loss 7.0418 LearningRate 0.0392 Epoch: 14 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:07,919-Speed 11378.65 samples/sec Loss 7.1050 LearningRate 0.0392 Epoch: 14 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:08,817-Speed 11412.31 samples/sec Loss 7.0881 LearningRate 0.0392 Epoch: 14 Global Step: 85030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:09,723-Speed 11313.59 samples/sec Loss 6.9578 LearningRate 0.0392 Epoch: 14 Global Step: 85040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:10,616-Speed 11470.76 samples/sec Loss 7.0138 LearningRate 0.0392 Epoch: 14 Global Step: 85050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:11,528-Speed 11236.33 samples/sec Loss 7.1084 LearningRate 0.0392 Epoch: 14 Global Step: 85060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:12,451-Speed 11103.91 samples/sec Loss 7.0355 LearningRate 0.0392 Epoch: 14 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:13,345-Speed 11464.95 samples/sec Loss 7.0822 LearningRate 0.0392 Epoch: 14 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:14,263-Speed 11165.68 samples/sec Loss 6.9756 LearningRate 0.0392 Epoch: 14 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:15,162-Speed 11397.41 samples/sec Loss 7.0672 LearningRate 0.0392 Epoch: 14 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:16,072-Speed 11263.16 samples/sec Loss 6.9412 LearningRate 0.0392 Epoch: 14 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:17,021-Speed 10794.13 samples/sec Loss 6.8649 LearningRate 0.0392 Epoch: 14 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:17,976-Speed 10726.37 samples/sec Loss 6.9359 LearningRate 0.0392 Epoch: 14 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:18,920-Speed 10858.03 samples/sec Loss 7.0912 LearningRate 0.0391 Epoch: 14 Global Step: 85140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:19,825-Speed 11329.52 samples/sec Loss 6.9874 LearningRate 0.0391 Epoch: 14 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:20,733-Speed 11276.58 samples/sec Loss 7.1409 LearningRate 0.0391 Epoch: 14 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:21,625-Speed 11486.97 samples/sec Loss 7.0540 LearningRate 0.0391 Epoch: 14 Global Step: 85170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:22,529-Speed 11341.50 samples/sec Loss 7.0341 LearningRate 0.0391 Epoch: 14 Global Step: 85180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:23,443-Speed 11204.36 samples/sec Loss 6.9405 LearningRate 0.0391 Epoch: 14 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:24,357-Speed 11217.33 samples/sec Loss 7.0754 LearningRate 0.0391 Epoch: 14 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:25,240-Speed 11608.43 samples/sec Loss 7.0100 LearningRate 0.0391 Epoch: 14 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:26,169-Speed 11031.52 samples/sec Loss 7.0376 LearningRate 0.0391 Epoch: 14 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:27,097-Speed 11032.33 samples/sec Loss 7.2949 LearningRate 0.0391 Epoch: 14 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:28,021-Speed 11094.45 samples/sec Loss 7.1031 LearningRate 0.0391 Epoch: 14 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:28,937-Speed 11191.24 samples/sec Loss 6.8484 LearningRate 0.0391 Epoch: 14 Global Step: 85250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:29,864-Speed 11054.80 samples/sec Loss 7.1363 LearningRate 0.0391 Epoch: 14 Global Step: 85260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:30,760-Speed 11429.96 samples/sec Loss 7.0394 LearningRate 0.0391 Epoch: 14 Global Step: 85270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:31,752-Speed 10335.48 samples/sec Loss 7.0247 LearningRate 0.0391 Epoch: 14 Global Step: 85280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:32,592-Speed 12190.31 samples/sec Loss 7.0896 LearningRate 0.0391 Epoch: 14 Global Step: 85290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:43,513-Speed 937.75 samples/sec Loss 6.3161 LearningRate 0.0391 Epoch: 15 Global Step: 85300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:44,424-Speed 11257.99 samples/sec Loss 6.3610 LearningRate 0.0391 Epoch: 15 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:45,498-Speed 9535.57 samples/sec Loss 6.4241 LearningRate 0.0390 Epoch: 15 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:46,402-Speed 11332.35 samples/sec Loss 6.3466 LearningRate 0.0390 Epoch: 15 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:47,645-Speed 8249.40 samples/sec Loss 6.1637 LearningRate 0.0390 Epoch: 15 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:48,577-Speed 10993.67 samples/sec Loss 6.3652 LearningRate 0.0390 Epoch: 15 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:49,489-Speed 11228.33 samples/sec Loss 6.3891 LearningRate 0.0390 Epoch: 15 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:50,455-Speed 10610.07 samples/sec Loss 6.2139 LearningRate 0.0390 Epoch: 15 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:51,382-Speed 11061.05 samples/sec Loss 6.3741 LearningRate 0.0390 Epoch: 15 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:52,342-Speed 10668.30 samples/sec Loss 6.3926 LearningRate 0.0390 Epoch: 15 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:53,249-Speed 11308.10 samples/sec Loss 6.2307 LearningRate 0.0390 Epoch: 15 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:54,195-Speed 10824.79 samples/sec Loss 6.5219 LearningRate 0.0390 Epoch: 15 Global Step: 85410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:55,092-Speed 11423.12 samples/sec Loss 6.4701 LearningRate 0.0390 Epoch: 15 Global Step: 85420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:55,994-Speed 11368.34 samples/sec Loss 6.4803 LearningRate 0.0390 Epoch: 15 Global Step: 85430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:56,897-Speed 11339.21 samples/sec Loss 6.3724 LearningRate 0.0390 Epoch: 15 Global Step: 85440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:57,815-Speed 11168.97 samples/sec Loss 6.3178 LearningRate 0.0390 Epoch: 15 Global Step: 85450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:16:58,708-Speed 11492.95 samples/sec Loss 6.5093 LearningRate 0.0390 Epoch: 15 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:16:59,686-Speed 10471.29 samples/sec Loss 6.4108 LearningRate 0.0390 Epoch: 15 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:00,617-Speed 11013.98 samples/sec Loss 6.5471 LearningRate 0.0390 Epoch: 15 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:01,597-Speed 10449.78 samples/sec Loss 6.5287 LearningRate 0.0390 Epoch: 15 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:02,524-Speed 11065.44 samples/sec Loss 6.4394 LearningRate 0.0389 Epoch: 15 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:03,435-Speed 11244.69 samples/sec Loss 6.4534 LearningRate 0.0389 Epoch: 15 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:04,357-Speed 11112.44 samples/sec Loss 6.4389 LearningRate 0.0389 Epoch: 15 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:05,289-Speed 10994.11 samples/sec Loss 6.5850 LearningRate 0.0389 Epoch: 15 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:06,221-Speed 10995.49 samples/sec Loss 6.3571 LearningRate 0.0389 Epoch: 15 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:07,132-Speed 11252.83 samples/sec Loss 6.5502 LearningRate 0.0389 Epoch: 15 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:08,026-Speed 11461.47 samples/sec Loss 6.3966 LearningRate 0.0389 Epoch: 15 Global Step: 85560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:08,928-Speed 11349.09 samples/sec Loss 6.4111 LearningRate 0.0389 Epoch: 15 Global Step: 85570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:09,809-Speed 11635.19 samples/sec Loss 6.4503 LearningRate 0.0389 Epoch: 15 Global Step: 85580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:10,702-Speed 11481.76 samples/sec Loss 6.4402 LearningRate 0.0389 Epoch: 15 Global Step: 85590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:11,611-Speed 11269.59 samples/sec Loss 6.4394 LearningRate 0.0389 Epoch: 15 Global Step: 85600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:12,572-Speed 10657.39 samples/sec Loss 6.4196 LearningRate 0.0389 Epoch: 15 Global Step: 85610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:13,488-Speed 11189.59 samples/sec Loss 6.6685 LearningRate 0.0389 Epoch: 15 Global Step: 85620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:14,429-Speed 10888.80 samples/sec Loss 6.3520 LearningRate 0.0389 Epoch: 15 Global Step: 85630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:15,387-Speed 10698.15 samples/sec Loss 6.5012 LearningRate 0.0389 Epoch: 15 Global Step: 85640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:16,319-Speed 10992.08 samples/sec Loss 6.4947 LearningRate 0.0389 Epoch: 15 Global Step: 85650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:17,256-Speed 10942.27 samples/sec Loss 6.5568 LearningRate 0.0389 Epoch: 15 Global Step: 85660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:18,219-Speed 10645.51 samples/sec Loss 6.7370 LearningRate 0.0389 Epoch: 15 Global Step: 85670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:19,182-Speed 10639.79 samples/sec Loss 6.4994 LearningRate 0.0388 Epoch: 15 Global Step: 85680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:20,113-Speed 11008.79 samples/sec Loss 6.5867 LearningRate 0.0388 Epoch: 15 Global Step: 85690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:21,150-Speed 9883.42 samples/sec Loss 6.5396 LearningRate 0.0388 Epoch: 15 Global Step: 85700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:22,036-Speed 11564.58 samples/sec Loss 6.5726 LearningRate 0.0388 Epoch: 15 Global Step: 85710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:22,944-Speed 11285.98 samples/sec Loss 6.5931 LearningRate 0.0388 Epoch: 15 Global Step: 85720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:23,961-Speed 10074.26 samples/sec Loss 6.6614 LearningRate 0.0388 Epoch: 15 Global Step: 85730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:24,866-Speed 11320.67 samples/sec Loss 6.6365 LearningRate 0.0388 Epoch: 15 Global Step: 85740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:25,826-Speed 10669.24 samples/sec Loss 6.4252 LearningRate 0.0388 Epoch: 15 Global Step: 85750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:26,737-Speed 11251.55 samples/sec Loss 6.4796 LearningRate 0.0388 Epoch: 15 Global Step: 85760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:27,716-Speed 10471.64 samples/sec Loss 6.5393 LearningRate 0.0388 Epoch: 15 Global Step: 85770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:28,624-Speed 11284.96 samples/sec Loss 6.5945 LearningRate 0.0388 Epoch: 15 Global Step: 85780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:29,565-Speed 10894.84 samples/sec Loss 6.5938 LearningRate 0.0388 Epoch: 15 Global Step: 85790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:30,497-Speed 10991.48 samples/sec Loss 6.4596 LearningRate 0.0388 Epoch: 15 Global Step: 85800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:31,404-Speed 11297.89 samples/sec Loss 6.6620 LearningRate 0.0388 Epoch: 15 Global Step: 85810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:32,335-Speed 11017.32 samples/sec Loss 6.5465 LearningRate 0.0388 Epoch: 15 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:33,235-Speed 11385.81 samples/sec Loss 6.6302 LearningRate 0.0388 Epoch: 15 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:34,120-Speed 11578.72 samples/sec Loss 6.5281 LearningRate 0.0388 Epoch: 15 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:35,042-Speed 11105.08 samples/sec Loss 6.6550 LearningRate 0.0388 Epoch: 15 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:35,966-Speed 11099.47 samples/sec Loss 6.5665 LearningRate 0.0387 Epoch: 15 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:36,884-Speed 11163.69 samples/sec Loss 6.6325 LearningRate 0.0387 Epoch: 15 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:37,841-Speed 10703.81 samples/sec Loss 6.6720 LearningRate 0.0387 Epoch: 15 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:38,767-Speed 11073.43 samples/sec Loss 6.5666 LearningRate 0.0387 Epoch: 15 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:39,719-Speed 10760.42 samples/sec Loss 6.5188 LearningRate 0.0387 Epoch: 15 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:17:40,685-Speed 10603.01 samples/sec Loss 6.5656 LearningRate 0.0387 Epoch: 15 Global Step: 85910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:41,604-Speed 11148.13 samples/sec Loss 6.6585 LearningRate 0.0387 Epoch: 15 Global Step: 85920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:42,536-Speed 10998.76 samples/sec Loss 6.6774 LearningRate 0.0387 Epoch: 15 Global Step: 85930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:43,454-Speed 11167.83 samples/sec Loss 6.6491 LearningRate 0.0387 Epoch: 15 Global Step: 85940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:44,360-Speed 11310.90 samples/sec Loss 6.7349 LearningRate 0.0387 Epoch: 15 Global Step: 85950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:45,259-Speed 11389.40 samples/sec Loss 6.8254 LearningRate 0.0387 Epoch: 15 Global Step: 85960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:46,154-Speed 11449.95 samples/sec Loss 6.6805 LearningRate 0.0387 Epoch: 15 Global Step: 85970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:47,076-Speed 11120.31 samples/sec Loss 6.5940 LearningRate 0.0387 Epoch: 15 Global Step: 85980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:47,993-Speed 11170.09 samples/sec Loss 6.6635 LearningRate 0.0387 Epoch: 15 Global Step: 85990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:17:48,966-Speed 10537.90 samples/sec Loss 6.7301 LearningRate 0.0387 Epoch: 15 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:18:10,337-[lfw][86000]XNorm: 11.808893 Training: 2022-04-11 02:18:10,337-[lfw][86000]Accuracy-Flip: 0.99600+-0.00271 Training: 2022-04-11 02:18:10,338-[lfw][86000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:18:34,930-[cfp_fp][86000]XNorm: 9.870085 Training: 2022-04-11 02:18:34,930-[cfp_fp][86000]Accuracy-Flip: 0.93800+-0.01207 Training: 2022-04-11 02:18:34,931-[cfp_fp][86000]Accuracy-Highest: 0.94371 Training: 2022-04-11 02:18:56,165-[agedb_30][86000]XNorm: 11.498027 Training: 2022-04-11 02:18:56,166-[agedb_30][86000]Accuracy-Flip: 0.95817+-0.00941 Training: 2022-04-11 02:18:56,166-[agedb_30][86000]Accuracy-Highest: 0.96083 Training: 2022-04-11 02:18:57,074-Speed 150.35 samples/sec Loss 6.6639 LearningRate 0.0387 Epoch: 15 Global Step: 86010 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:18:57,972-Speed 11399.57 samples/sec Loss 6.7319 LearningRate 0.0387 Epoch: 15 Global Step: 86020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:18:58,892-Speed 11140.10 samples/sec Loss 6.5692 LearningRate 0.0387 Epoch: 15 Global Step: 86030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:18:59,846-Speed 10737.60 samples/sec Loss 6.6042 LearningRate 0.0387 Epoch: 15 Global Step: 86040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:00,773-Speed 11058.63 samples/sec Loss 6.5327 LearningRate 0.0386 Epoch: 15 Global Step: 86050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:01,676-Speed 11349.75 samples/sec Loss 6.5894 LearningRate 0.0386 Epoch: 15 Global Step: 86060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:02,655-Speed 10470.78 samples/sec Loss 6.6151 LearningRate 0.0386 Epoch: 15 Global Step: 86070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:03,547-Speed 11496.30 samples/sec Loss 6.6650 LearningRate 0.0386 Epoch: 15 Global Step: 86080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:04,436-Speed 11523.59 samples/sec Loss 6.6503 LearningRate 0.0386 Epoch: 15 Global Step: 86090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:05,425-Speed 10363.13 samples/sec Loss 6.6529 LearningRate 0.0386 Epoch: 15 Global Step: 86100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:06,359-Speed 10969.00 samples/sec Loss 6.5857 LearningRate 0.0386 Epoch: 15 Global Step: 86110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:07,259-Speed 11383.60 samples/sec Loss 6.6835 LearningRate 0.0386 Epoch: 15 Global Step: 86120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:08,182-Speed 11109.13 samples/sec Loss 6.6688 LearningRate 0.0386 Epoch: 15 Global Step: 86130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:09,072-Speed 11510.78 samples/sec Loss 6.5447 LearningRate 0.0386 Epoch: 15 Global Step: 86140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:09,961-Speed 11523.16 samples/sec Loss 6.6891 LearningRate 0.0386 Epoch: 15 Global Step: 86150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:10,882-Speed 11135.17 samples/sec Loss 6.7206 LearningRate 0.0386 Epoch: 15 Global Step: 86160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:11,818-Speed 10937.74 samples/sec Loss 6.7462 LearningRate 0.0386 Epoch: 15 Global Step: 86170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:12,726-Speed 11288.21 samples/sec Loss 6.6725 LearningRate 0.0386 Epoch: 15 Global Step: 86180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:13,644-Speed 11164.20 samples/sec Loss 6.7220 LearningRate 0.0386 Epoch: 15 Global Step: 86190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:14,568-Speed 11086.33 samples/sec Loss 6.7769 LearningRate 0.0386 Epoch: 15 Global Step: 86200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:15,480-Speed 11233.11 samples/sec Loss 6.7402 LearningRate 0.0386 Epoch: 15 Global Step: 86210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:16,381-Speed 11372.38 samples/sec Loss 6.7250 LearningRate 0.0386 Epoch: 15 Global Step: 86220 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:19:17,280-Speed 11408.39 samples/sec Loss 6.7059 LearningRate 0.0385 Epoch: 15 Global Step: 86230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:18,213-Speed 10980.32 samples/sec Loss 6.8368 LearningRate 0.0385 Epoch: 15 Global Step: 86240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:19,133-Speed 11142.16 samples/sec Loss 6.6847 LearningRate 0.0385 Epoch: 15 Global Step: 86250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:20,057-Speed 11091.77 samples/sec Loss 6.8194 LearningRate 0.0385 Epoch: 15 Global Step: 86260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:20,972-Speed 11191.11 samples/sec Loss 6.9105 LearningRate 0.0385 Epoch: 15 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:21,867-Speed 11449.56 samples/sec Loss 6.6953 LearningRate 0.0385 Epoch: 15 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:22,771-Speed 11343.92 samples/sec Loss 6.7416 LearningRate 0.0385 Epoch: 15 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:23,691-Speed 11138.31 samples/sec Loss 6.7806 LearningRate 0.0385 Epoch: 15 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:24,591-Speed 11382.70 samples/sec Loss 6.7495 LearningRate 0.0385 Epoch: 15 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:25,479-Speed 11544.11 samples/sec Loss 6.7439 LearningRate 0.0385 Epoch: 15 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:26,376-Speed 11419.16 samples/sec Loss 6.8131 LearningRate 0.0385 Epoch: 15 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:27,292-Speed 11178.84 samples/sec Loss 6.9091 LearningRate 0.0385 Epoch: 15 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:28,208-Speed 11197.59 samples/sec Loss 6.7643 LearningRate 0.0385 Epoch: 15 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:29,130-Speed 11138.11 samples/sec Loss 6.8029 LearningRate 0.0385 Epoch: 15 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:30,036-Speed 11300.92 samples/sec Loss 6.8706 LearningRate 0.0385 Epoch: 15 Global Step: 86370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:30,960-Speed 11089.28 samples/sec Loss 6.9086 LearningRate 0.0385 Epoch: 15 Global Step: 86380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:31,864-Speed 11338.69 samples/sec Loss 6.7964 LearningRate 0.0385 Epoch: 15 Global Step: 86390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:32,756-Speed 11485.31 samples/sec Loss 6.7945 LearningRate 0.0385 Epoch: 15 Global Step: 86400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:33,675-Speed 11147.17 samples/sec Loss 6.7613 LearningRate 0.0384 Epoch: 15 Global Step: 86410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:34,574-Speed 11402.47 samples/sec Loss 6.7883 LearningRate 0.0384 Epoch: 15 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:35,483-Speed 11279.11 samples/sec Loss 6.7868 LearningRate 0.0384 Epoch: 15 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:36,413-Speed 11013.34 samples/sec Loss 6.6420 LearningRate 0.0384 Epoch: 15 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:37,344-Speed 11004.71 samples/sec Loss 6.8445 LearningRate 0.0384 Epoch: 15 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:38,287-Speed 10872.56 samples/sec Loss 6.6251 LearningRate 0.0384 Epoch: 15 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:39,243-Speed 10707.42 samples/sec Loss 6.7704 LearningRate 0.0384 Epoch: 15 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:40,171-Speed 11051.65 samples/sec Loss 6.9570 LearningRate 0.0384 Epoch: 15 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:41,082-Speed 11245.91 samples/sec Loss 6.7027 LearningRate 0.0384 Epoch: 15 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:42,050-Speed 10591.57 samples/sec Loss 6.8974 LearningRate 0.0384 Epoch: 15 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:42,973-Speed 11100.09 samples/sec Loss 6.8785 LearningRate 0.0384 Epoch: 15 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:43,876-Speed 11344.07 samples/sec Loss 6.8679 LearningRate 0.0384 Epoch: 15 Global Step: 86520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:19:44,750-Speed 11733.71 samples/sec Loss 6.8785 LearningRate 0.0384 Epoch: 15 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:45,637-Speed 11541.86 samples/sec Loss 6.7373 LearningRate 0.0384 Epoch: 15 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:46,545-Speed 11286.60 samples/sec Loss 6.7010 LearningRate 0.0384 Epoch: 15 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:47,460-Speed 11198.14 samples/sec Loss 6.8256 LearningRate 0.0384 Epoch: 15 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:48,348-Speed 11548.36 samples/sec Loss 6.9143 LearningRate 0.0384 Epoch: 15 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:49,257-Speed 11268.97 samples/sec Loss 6.7164 LearningRate 0.0384 Epoch: 15 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:50,180-Speed 11096.17 samples/sec Loss 6.9089 LearningRate 0.0384 Epoch: 15 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:51,149-Speed 10575.78 samples/sec Loss 6.9366 LearningRate 0.0383 Epoch: 15 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:52,108-Speed 10690.52 samples/sec Loss 6.7847 LearningRate 0.0383 Epoch: 15 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:53,016-Speed 11281.00 samples/sec Loss 6.8698 LearningRate 0.0383 Epoch: 15 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:53,961-Speed 10841.63 samples/sec Loss 6.7790 LearningRate 0.0383 Epoch: 15 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:54,876-Speed 11203.07 samples/sec Loss 6.7491 LearningRate 0.0383 Epoch: 15 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:55,840-Speed 10635.25 samples/sec Loss 6.8612 LearningRate 0.0383 Epoch: 15 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:56,751-Speed 11244.89 samples/sec Loss 6.7804 LearningRate 0.0383 Epoch: 15 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:57,635-Speed 11588.03 samples/sec Loss 6.9240 LearningRate 0.0383 Epoch: 15 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:58,534-Speed 11399.78 samples/sec Loss 6.7777 LearningRate 0.0383 Epoch: 15 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:19:59,445-Speed 11246.70 samples/sec Loss 6.8361 LearningRate 0.0383 Epoch: 15 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:00,331-Speed 11565.62 samples/sec Loss 6.7767 LearningRate 0.0383 Epoch: 15 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:01,275-Speed 10855.79 samples/sec Loss 6.9157 LearningRate 0.0383 Epoch: 15 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:02,198-Speed 11119.25 samples/sec Loss 6.7751 LearningRate 0.0383 Epoch: 15 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:03,136-Speed 10925.70 samples/sec Loss 6.8598 LearningRate 0.0383 Epoch: 15 Global Step: 86730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:04,098-Speed 10657.59 samples/sec Loss 6.9391 LearningRate 0.0383 Epoch: 15 Global Step: 86740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:05,001-Speed 11341.49 samples/sec Loss 6.9311 LearningRate 0.0383 Epoch: 15 Global Step: 86750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:05,955-Speed 10748.35 samples/sec Loss 6.8200 LearningRate 0.0383 Epoch: 15 Global Step: 86760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:06,918-Speed 10634.63 samples/sec Loss 6.7593 LearningRate 0.0383 Epoch: 15 Global Step: 86770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:07,878-Speed 10675.12 samples/sec Loss 6.6379 LearningRate 0.0382 Epoch: 15 Global Step: 86780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:08,773-Speed 11441.75 samples/sec Loss 6.8368 LearningRate 0.0382 Epoch: 15 Global Step: 86790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:09,675-Speed 11363.58 samples/sec Loss 6.8742 LearningRate 0.0382 Epoch: 15 Global Step: 86800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:10,546-Speed 11763.17 samples/sec Loss 6.8783 LearningRate 0.0382 Epoch: 15 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:11,463-Speed 11180.91 samples/sec Loss 6.7946 LearningRate 0.0382 Epoch: 15 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:12,404-Speed 10890.96 samples/sec Loss 6.9721 LearningRate 0.0382 Epoch: 15 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:13,317-Speed 11216.45 samples/sec Loss 6.8463 LearningRate 0.0382 Epoch: 15 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:14,246-Speed 11036.48 samples/sec Loss 6.5276 LearningRate 0.0382 Epoch: 15 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:15,196-Speed 10785.58 samples/sec Loss 6.9580 LearningRate 0.0382 Epoch: 15 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:16,101-Speed 11323.46 samples/sec Loss 7.0204 LearningRate 0.0382 Epoch: 15 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:17,020-Speed 11142.43 samples/sec Loss 6.8560 LearningRate 0.0382 Epoch: 15 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:17,942-Speed 11118.40 samples/sec Loss 7.0177 LearningRate 0.0382 Epoch: 15 Global Step: 86890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:18,876-Speed 10968.56 samples/sec Loss 6.8319 LearningRate 0.0382 Epoch: 15 Global Step: 86900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:19,837-Speed 10679.07 samples/sec Loss 6.8244 LearningRate 0.0382 Epoch: 15 Global Step: 86910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:20,713-Speed 11695.47 samples/sec Loss 6.9134 LearningRate 0.0382 Epoch: 15 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:21,617-Speed 11333.43 samples/sec Loss 6.8709 LearningRate 0.0382 Epoch: 15 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:22,522-Speed 11331.18 samples/sec Loss 6.7629 LearningRate 0.0382 Epoch: 15 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:23,426-Speed 11324.26 samples/sec Loss 6.9115 LearningRate 0.0382 Epoch: 15 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:24,345-Speed 11154.86 samples/sec Loss 6.8367 LearningRate 0.0382 Epoch: 15 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:25,283-Speed 10920.69 samples/sec Loss 6.7263 LearningRate 0.0381 Epoch: 15 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:26,240-Speed 10716.42 samples/sec Loss 6.7147 LearningRate 0.0381 Epoch: 15 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:27,127-Speed 11549.77 samples/sec Loss 6.9217 LearningRate 0.0381 Epoch: 15 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:28,045-Speed 11168.44 samples/sec Loss 6.8388 LearningRate 0.0381 Epoch: 15 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:28,983-Speed 10917.54 samples/sec Loss 6.8522 LearningRate 0.0381 Epoch: 15 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:29,907-Speed 11105.53 samples/sec Loss 6.9087 LearningRate 0.0381 Epoch: 15 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:30,824-Speed 11172.54 samples/sec Loss 6.7280 LearningRate 0.0381 Epoch: 15 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:31,736-Speed 11236.66 samples/sec Loss 6.8514 LearningRate 0.0381 Epoch: 15 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:32,656-Speed 11137.75 samples/sec Loss 6.9252 LearningRate 0.0381 Epoch: 15 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:33,551-Speed 11456.29 samples/sec Loss 6.8805 LearningRate 0.0381 Epoch: 15 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:34,450-Speed 11397.05 samples/sec Loss 6.9315 LearningRate 0.0381 Epoch: 15 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:35,347-Speed 11422.05 samples/sec Loss 6.8717 LearningRate 0.0381 Epoch: 15 Global Step: 87080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:36,279-Speed 10992.38 samples/sec Loss 6.8994 LearningRate 0.0381 Epoch: 15 Global Step: 87090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:37,227-Speed 10814.36 samples/sec Loss 6.8405 LearningRate 0.0381 Epoch: 15 Global Step: 87100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:38,154-Speed 11050.10 samples/sec Loss 6.9504 LearningRate 0.0381 Epoch: 15 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:39,062-Speed 11277.80 samples/sec Loss 6.7016 LearningRate 0.0381 Epoch: 15 Global Step: 87120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:40,025-Speed 10646.55 samples/sec Loss 6.8901 LearningRate 0.0381 Epoch: 15 Global Step: 87130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:40,964-Speed 10911.07 samples/sec Loss 6.9206 LearningRate 0.0381 Epoch: 15 Global Step: 87140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:41,923-Speed 10685.60 samples/sec Loss 6.9850 LearningRate 0.0380 Epoch: 15 Global Step: 87150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:42,867-Speed 10850.32 samples/sec Loss 6.8881 LearningRate 0.0380 Epoch: 15 Global Step: 87160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:43,769-Speed 11367.31 samples/sec Loss 6.8068 LearningRate 0.0380 Epoch: 15 Global Step: 87170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:44,662-Speed 11472.87 samples/sec Loss 6.9116 LearningRate 0.0380 Epoch: 15 Global Step: 87180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:45,555-Speed 11469.94 samples/sec Loss 7.0503 LearningRate 0.0380 Epoch: 15 Global Step: 87190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:46,459-Speed 11346.14 samples/sec Loss 6.9650 LearningRate 0.0380 Epoch: 15 Global Step: 87200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:47,370-Speed 11250.95 samples/sec Loss 6.9424 LearningRate 0.0380 Epoch: 15 Global Step: 87210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:48,282-Speed 11237.69 samples/sec Loss 6.9521 LearningRate 0.0380 Epoch: 15 Global Step: 87220 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:20:49,202-Speed 11132.92 samples/sec Loss 6.8433 LearningRate 0.0380 Epoch: 15 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:50,187-Speed 10404.82 samples/sec Loss 6.9205 LearningRate 0.0380 Epoch: 15 Global Step: 87240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:51,129-Speed 10873.35 samples/sec Loss 7.0287 LearningRate 0.0380 Epoch: 15 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:52,081-Speed 10761.52 samples/sec Loss 6.7416 LearningRate 0.0380 Epoch: 15 Global Step: 87260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:52,980-Speed 11411.09 samples/sec Loss 6.9015 LearningRate 0.0380 Epoch: 15 Global Step: 87270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:53,911-Speed 10999.25 samples/sec Loss 6.9731 LearningRate 0.0380 Epoch: 15 Global Step: 87280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:54,864-Speed 10757.81 samples/sec Loss 6.8925 LearningRate 0.0380 Epoch: 15 Global Step: 87290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:55,784-Speed 11132.41 samples/sec Loss 6.8666 LearningRate 0.0380 Epoch: 15 Global Step: 87300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:20:56,707-Speed 11108.15 samples/sec Loss 6.9661 LearningRate 0.0380 Epoch: 15 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:57,600-Speed 11471.21 samples/sec Loss 6.8849 LearningRate 0.0380 Epoch: 15 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:58,510-Speed 11262.00 samples/sec Loss 7.0424 LearningRate 0.0379 Epoch: 15 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:20:59,426-Speed 11184.22 samples/sec Loss 6.7255 LearningRate 0.0379 Epoch: 15 Global Step: 87340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:00,325-Speed 11402.19 samples/sec Loss 6.8876 LearningRate 0.0379 Epoch: 15 Global Step: 87350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:01,247-Speed 11111.30 samples/sec Loss 6.7903 LearningRate 0.0379 Epoch: 15 Global Step: 87360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:02,148-Speed 11375.11 samples/sec Loss 6.7351 LearningRate 0.0379 Epoch: 15 Global Step: 87370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:03,072-Speed 11091.95 samples/sec Loss 7.0147 LearningRate 0.0379 Epoch: 15 Global Step: 87380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:03,965-Speed 11470.69 samples/sec Loss 6.8678 LearningRate 0.0379 Epoch: 15 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:04,874-Speed 11275.24 samples/sec Loss 6.8308 LearningRate 0.0379 Epoch: 15 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:05,807-Speed 10984.68 samples/sec Loss 6.8567 LearningRate 0.0379 Epoch: 15 Global Step: 87410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:06,734-Speed 11051.35 samples/sec Loss 6.7794 LearningRate 0.0379 Epoch: 15 Global Step: 87420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:07,657-Speed 11097.90 samples/sec Loss 7.0614 LearningRate 0.0379 Epoch: 15 Global Step: 87430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:08,586-Speed 11037.57 samples/sec Loss 6.9584 LearningRate 0.0379 Epoch: 15 Global Step: 87440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:09,486-Speed 11385.52 samples/sec Loss 6.7806 LearningRate 0.0379 Epoch: 15 Global Step: 87450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:10,394-Speed 11292.55 samples/sec Loss 6.9120 LearningRate 0.0379 Epoch: 15 Global Step: 87460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:11,323-Speed 11019.68 samples/sec Loss 6.9807 LearningRate 0.0379 Epoch: 15 Global Step: 87470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:12,225-Speed 11365.60 samples/sec Loss 6.8709 LearningRate 0.0379 Epoch: 15 Global Step: 87480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:13,137-Speed 11238.83 samples/sec Loss 6.9015 LearningRate 0.0379 Epoch: 15 Global Step: 87490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:14,068-Speed 11007.19 samples/sec Loss 7.0454 LearningRate 0.0379 Epoch: 15 Global Step: 87500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:15,016-Speed 10805.62 samples/sec Loss 7.0371 LearningRate 0.0379 Epoch: 15 Global Step: 87510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:15,924-Speed 11282.48 samples/sec Loss 6.9793 LearningRate 0.0378 Epoch: 15 Global Step: 87520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:16,829-Speed 11327.68 samples/sec Loss 6.8676 LearningRate 0.0378 Epoch: 15 Global Step: 87530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:17,774-Speed 10842.40 samples/sec Loss 6.8126 LearningRate 0.0378 Epoch: 15 Global Step: 87540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:18,690-Speed 11191.11 samples/sec Loss 6.8843 LearningRate 0.0378 Epoch: 15 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:19,571-Speed 11634.09 samples/sec Loss 6.8534 LearningRate 0.0378 Epoch: 15 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:20,498-Speed 11047.14 samples/sec Loss 6.8989 LearningRate 0.0378 Epoch: 15 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:21,394-Speed 11435.25 samples/sec Loss 6.9375 LearningRate 0.0378 Epoch: 15 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:22,301-Speed 11294.75 samples/sec Loss 7.0075 LearningRate 0.0378 Epoch: 15 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:23,216-Speed 11210.78 samples/sec Loss 6.8815 LearningRate 0.0378 Epoch: 15 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:24,135-Speed 11149.70 samples/sec Loss 6.9328 LearningRate 0.0378 Epoch: 15 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:25,062-Speed 11053.57 samples/sec Loss 6.7637 LearningRate 0.0378 Epoch: 15 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:25,990-Speed 11033.43 samples/sec Loss 6.9545 LearningRate 0.0378 Epoch: 15 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:26,890-Speed 11384.30 samples/sec Loss 6.8607 LearningRate 0.0378 Epoch: 15 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:27,827-Speed 10940.38 samples/sec Loss 6.8432 LearningRate 0.0378 Epoch: 15 Global Step: 87650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:28,771-Speed 10850.09 samples/sec Loss 7.0009 LearningRate 0.0378 Epoch: 15 Global Step: 87660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:29,702-Speed 11021.76 samples/sec Loss 6.9941 LearningRate 0.0378 Epoch: 15 Global Step: 87670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:30,618-Speed 11186.32 samples/sec Loss 6.8343 LearningRate 0.0378 Epoch: 15 Global Step: 87680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:31,547-Speed 11021.97 samples/sec Loss 6.8256 LearningRate 0.0378 Epoch: 15 Global Step: 87690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:32,467-Speed 11141.95 samples/sec Loss 6.8581 LearningRate 0.0377 Epoch: 15 Global Step: 87700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:33,356-Speed 11533.85 samples/sec Loss 6.5827 LearningRate 0.0377 Epoch: 15 Global Step: 87710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:34,259-Speed 11346.26 samples/sec Loss 6.9518 LearningRate 0.0377 Epoch: 15 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:35,218-Speed 10682.29 samples/sec Loss 6.8893 LearningRate 0.0377 Epoch: 15 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:36,183-Speed 10612.46 samples/sec Loss 7.0229 LearningRate 0.0377 Epoch: 15 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:37,100-Speed 11181.18 samples/sec Loss 6.9504 LearningRate 0.0377 Epoch: 15 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:38,059-Speed 10676.05 samples/sec Loss 6.8945 LearningRate 0.0377 Epoch: 15 Global Step: 87760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:39,014-Speed 10737.16 samples/sec Loss 7.0120 LearningRate 0.0377 Epoch: 15 Global Step: 87770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:39,930-Speed 11184.04 samples/sec Loss 6.9422 LearningRate 0.0377 Epoch: 15 Global Step: 87780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:40,843-Speed 11222.49 samples/sec Loss 6.8822 LearningRate 0.0377 Epoch: 15 Global Step: 87790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:41,775-Speed 11013.26 samples/sec Loss 6.8588 LearningRate 0.0377 Epoch: 15 Global Step: 87800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:42,688-Speed 11221.06 samples/sec Loss 6.9256 LearningRate 0.0377 Epoch: 15 Global Step: 87810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:43,641-Speed 10744.95 samples/sec Loss 6.9625 LearningRate 0.0377 Epoch: 15 Global Step: 87820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:44,591-Speed 10786.13 samples/sec Loss 7.0043 LearningRate 0.0377 Epoch: 15 Global Step: 87830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:45,481-Speed 11519.57 samples/sec Loss 6.9451 LearningRate 0.0377 Epoch: 15 Global Step: 87840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:46,398-Speed 11173.85 samples/sec Loss 7.0229 LearningRate 0.0377 Epoch: 15 Global Step: 87850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:47,300-Speed 11363.59 samples/sec Loss 6.8127 LearningRate 0.0377 Epoch: 15 Global Step: 87860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:48,228-Speed 11034.57 samples/sec Loss 6.9507 LearningRate 0.0377 Epoch: 15 Global Step: 87870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:49,165-Speed 10938.51 samples/sec Loss 6.6931 LearningRate 0.0377 Epoch: 15 Global Step: 87880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:50,099-Speed 10966.82 samples/sec Loss 6.8384 LearningRate 0.0376 Epoch: 15 Global Step: 87890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:51,036-Speed 10942.28 samples/sec Loss 6.9806 LearningRate 0.0376 Epoch: 15 Global Step: 87900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:51,976-Speed 10897.55 samples/sec Loss 7.0287 LearningRate 0.0376 Epoch: 15 Global Step: 87910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:52,924-Speed 10819.52 samples/sec Loss 6.8819 LearningRate 0.0376 Epoch: 15 Global Step: 87920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:53,896-Speed 10537.75 samples/sec Loss 6.9667 LearningRate 0.0376 Epoch: 15 Global Step: 87930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:54,808-Speed 11237.03 samples/sec Loss 6.8810 LearningRate 0.0376 Epoch: 15 Global Step: 87940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:21:55,729-Speed 11130.17 samples/sec Loss 6.7577 LearningRate 0.0376 Epoch: 15 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:56,646-Speed 11169.04 samples/sec Loss 6.8215 LearningRate 0.0376 Epoch: 15 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:57,532-Speed 11564.55 samples/sec Loss 6.9251 LearningRate 0.0376 Epoch: 15 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:58,428-Speed 11443.79 samples/sec Loss 6.8450 LearningRate 0.0376 Epoch: 15 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:21:59,330-Speed 11358.04 samples/sec Loss 6.9657 LearningRate 0.0376 Epoch: 15 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:22:00,237-Speed 11293.66 samples/sec Loss 7.0378 LearningRate 0.0376 Epoch: 15 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:22:21,927-[lfw][88000]XNorm: 11.717659 Training: 2022-04-11 02:22:21,927-[lfw][88000]Accuracy-Flip: 0.99583+-0.00335 Training: 2022-04-11 02:22:21,928-[lfw][88000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:22:47,039-[cfp_fp][88000]XNorm: 9.794426 Training: 2022-04-11 02:22:47,040-[cfp_fp][88000]Accuracy-Flip: 0.93971+-0.01341 Training: 2022-04-11 02:22:47,040-[cfp_fp][88000]Accuracy-Highest: 0.94371 Training: 2022-04-11 02:23:08,710-[agedb_30][88000]XNorm: 11.505844 Training: 2022-04-11 02:23:08,710-[agedb_30][88000]Accuracy-Flip: 0.96383+-0.00989 Training: 2022-04-11 02:23:08,711-[agedb_30][88000]Accuracy-Highest: 0.96383 Training: 2022-04-11 02:23:09,635-Speed 147.56 samples/sec Loss 6.7697 LearningRate 0.0376 Epoch: 15 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:10,541-Speed 11308.72 samples/sec Loss 6.9693 LearningRate 0.0376 Epoch: 15 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:11,440-Speed 11391.79 samples/sec Loss 6.7919 LearningRate 0.0376 Epoch: 15 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:12,360-Speed 11142.70 samples/sec Loss 6.9050 LearningRate 0.0376 Epoch: 15 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:13,292-Speed 10987.05 samples/sec Loss 7.0300 LearningRate 0.0376 Epoch: 15 Global Step: 88050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:14,243-Speed 10783.26 samples/sec Loss 6.9971 LearningRate 0.0376 Epoch: 15 Global Step: 88060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:15,150-Speed 11293.53 samples/sec Loss 6.8862 LearningRate 0.0375 Epoch: 15 Global Step: 88070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:16,063-Speed 11222.77 samples/sec Loss 7.0279 LearningRate 0.0375 Epoch: 15 Global Step: 88080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:16,987-Speed 11097.24 samples/sec Loss 6.8038 LearningRate 0.0375 Epoch: 15 Global Step: 88090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:17,908-Speed 11122.14 samples/sec Loss 7.0401 LearningRate 0.0375 Epoch: 15 Global Step: 88100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:18,811-Speed 11346.85 samples/sec Loss 6.8777 LearningRate 0.0375 Epoch: 15 Global Step: 88110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:19,719-Speed 11281.78 samples/sec Loss 6.9907 LearningRate 0.0375 Epoch: 15 Global Step: 88120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:20,632-Speed 11229.02 samples/sec Loss 7.0370 LearningRate 0.0375 Epoch: 15 Global Step: 88130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:21,515-Speed 11600.27 samples/sec Loss 6.8921 LearningRate 0.0375 Epoch: 15 Global Step: 88140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:22,394-Speed 11656.93 samples/sec Loss 6.8362 LearningRate 0.0375 Epoch: 15 Global Step: 88150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:23,327-Speed 10989.39 samples/sec Loss 6.8833 LearningRate 0.0375 Epoch: 15 Global Step: 88160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:24,249-Speed 11111.81 samples/sec Loss 7.0310 LearningRate 0.0375 Epoch: 15 Global Step: 88170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:25,166-Speed 11173.59 samples/sec Loss 6.7277 LearningRate 0.0375 Epoch: 15 Global Step: 88180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:26,051-Speed 11579.82 samples/sec Loss 6.9260 LearningRate 0.0375 Epoch: 15 Global Step: 88190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:26,985-Speed 10972.78 samples/sec Loss 6.7943 LearningRate 0.0375 Epoch: 15 Global Step: 88200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:27,932-Speed 10818.93 samples/sec Loss 6.8888 LearningRate 0.0375 Epoch: 15 Global Step: 88210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:28,871-Speed 10911.47 samples/sec Loss 6.9364 LearningRate 0.0375 Epoch: 15 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:29,803-Speed 10989.01 samples/sec Loss 6.9458 LearningRate 0.0375 Epoch: 15 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:30,752-Speed 10806.90 samples/sec Loss 7.1568 LearningRate 0.0375 Epoch: 15 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:31,655-Speed 11336.20 samples/sec Loss 7.0420 LearningRate 0.0375 Epoch: 15 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:32,616-Speed 10668.15 samples/sec Loss 6.9140 LearningRate 0.0374 Epoch: 15 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:33,504-Speed 11536.70 samples/sec Loss 7.0380 LearningRate 0.0374 Epoch: 15 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:34,402-Speed 11413.19 samples/sec Loss 6.7356 LearningRate 0.0374 Epoch: 15 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:35,313-Speed 11244.49 samples/sec Loss 6.9083 LearningRate 0.0374 Epoch: 15 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:36,269-Speed 10721.03 samples/sec Loss 7.0092 LearningRate 0.0374 Epoch: 15 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:37,187-Speed 11165.68 samples/sec Loss 6.9389 LearningRate 0.0374 Epoch: 15 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:38,091-Speed 11334.28 samples/sec Loss 6.9626 LearningRate 0.0374 Epoch: 15 Global Step: 88320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:39,012-Speed 11126.62 samples/sec Loss 7.0657 LearningRate 0.0374 Epoch: 15 Global Step: 88330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:39,915-Speed 11340.28 samples/sec Loss 6.7783 LearningRate 0.0374 Epoch: 15 Global Step: 88340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:40,812-Speed 11433.87 samples/sec Loss 6.9810 LearningRate 0.0374 Epoch: 15 Global Step: 88350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:41,739-Speed 11050.52 samples/sec Loss 6.7984 LearningRate 0.0374 Epoch: 15 Global Step: 88360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:42,656-Speed 11182.25 samples/sec Loss 7.0087 LearningRate 0.0374 Epoch: 15 Global Step: 88370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:43,569-Speed 11220.73 samples/sec Loss 6.8604 LearningRate 0.0374 Epoch: 15 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:44,474-Speed 11325.42 samples/sec Loss 7.0451 LearningRate 0.0374 Epoch: 15 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:45,353-Speed 11648.46 samples/sec Loss 6.8729 LearningRate 0.0374 Epoch: 15 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:46,235-Speed 11624.64 samples/sec Loss 6.9848 LearningRate 0.0374 Epoch: 15 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:47,154-Speed 11150.11 samples/sec Loss 6.9155 LearningRate 0.0374 Epoch: 15 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:48,106-Speed 10757.82 samples/sec Loss 6.9280 LearningRate 0.0374 Epoch: 15 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:49,069-Speed 10642.77 samples/sec Loss 6.7458 LearningRate 0.0374 Epoch: 15 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:50,007-Speed 10924.97 samples/sec Loss 6.9642 LearningRate 0.0373 Epoch: 15 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:50,914-Speed 11292.96 samples/sec Loss 6.9369 LearningRate 0.0373 Epoch: 15 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:51,879-Speed 10622.61 samples/sec Loss 7.0047 LearningRate 0.0373 Epoch: 15 Global Step: 88470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:52,808-Speed 11027.24 samples/sec Loss 6.9716 LearningRate 0.0373 Epoch: 15 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:53,749-Speed 10892.30 samples/sec Loss 6.9684 LearningRate 0.0373 Epoch: 15 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:23:54,682-Speed 10982.64 samples/sec Loss 6.8435 LearningRate 0.0373 Epoch: 15 Global Step: 88500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:55,590-Speed 11280.39 samples/sec Loss 6.9699 LearningRate 0.0373 Epoch: 15 Global Step: 88510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:56,506-Speed 11194.63 samples/sec Loss 6.9714 LearningRate 0.0373 Epoch: 15 Global Step: 88520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:57,398-Speed 11482.31 samples/sec Loss 6.9059 LearningRate 0.0373 Epoch: 15 Global Step: 88530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:58,310-Speed 11239.42 samples/sec Loss 6.9233 LearningRate 0.0373 Epoch: 15 Global Step: 88540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:23:59,251-Speed 10893.12 samples/sec Loss 6.8944 LearningRate 0.0373 Epoch: 15 Global Step: 88550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:00,150-Speed 11396.19 samples/sec Loss 6.8752 LearningRate 0.0373 Epoch: 15 Global Step: 88560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:01,035-Speed 11577.16 samples/sec Loss 6.8797 LearningRate 0.0373 Epoch: 15 Global Step: 88570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:01,966-Speed 11011.30 samples/sec Loss 6.9841 LearningRate 0.0373 Epoch: 15 Global Step: 88580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:02,897-Speed 10997.28 samples/sec Loss 6.9716 LearningRate 0.0373 Epoch: 15 Global Step: 88590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:03,830-Speed 10991.30 samples/sec Loss 6.8433 LearningRate 0.0373 Epoch: 15 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:04,774-Speed 10852.68 samples/sec Loss 7.1494 LearningRate 0.0373 Epoch: 15 Global Step: 88610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:05,711-Speed 10931.69 samples/sec Loss 6.8777 LearningRate 0.0373 Epoch: 15 Global Step: 88620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:06,645-Speed 10974.92 samples/sec Loss 6.8062 LearningRate 0.0372 Epoch: 15 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:07,581-Speed 10947.39 samples/sec Loss 6.9214 LearningRate 0.0372 Epoch: 15 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:08,515-Speed 10967.81 samples/sec Loss 6.9222 LearningRate 0.0372 Epoch: 15 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:09,413-Speed 11409.67 samples/sec Loss 6.9938 LearningRate 0.0372 Epoch: 15 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:10,314-Speed 11372.13 samples/sec Loss 6.7900 LearningRate 0.0372 Epoch: 15 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:11,213-Speed 11397.27 samples/sec Loss 7.0018 LearningRate 0.0372 Epoch: 15 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:12,128-Speed 11206.41 samples/sec Loss 6.8399 LearningRate 0.0372 Epoch: 15 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:13,043-Speed 11190.54 samples/sec Loss 6.9616 LearningRate 0.0372 Epoch: 15 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:14,008-Speed 10626.25 samples/sec Loss 6.8374 LearningRate 0.0372 Epoch: 15 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:14,964-Speed 10715.25 samples/sec Loss 6.8397 LearningRate 0.0372 Epoch: 15 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:15,913-Speed 10805.04 samples/sec Loss 6.9405 LearningRate 0.0372 Epoch: 15 Global Step: 88730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:16,862-Speed 10795.42 samples/sec Loss 6.9382 LearningRate 0.0372 Epoch: 15 Global Step: 88740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:17,773-Speed 11245.38 samples/sec Loss 6.8887 LearningRate 0.0372 Epoch: 15 Global Step: 88750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:18,703-Speed 11011.34 samples/sec Loss 6.8979 LearningRate 0.0372 Epoch: 15 Global Step: 88760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:19,655-Speed 10765.41 samples/sec Loss 6.8921 LearningRate 0.0372 Epoch: 15 Global Step: 88770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:20,556-Speed 11377.61 samples/sec Loss 6.8894 LearningRate 0.0372 Epoch: 15 Global Step: 88780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:21,439-Speed 11598.16 samples/sec Loss 7.0754 LearningRate 0.0372 Epoch: 15 Global Step: 88790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:22,358-Speed 11154.97 samples/sec Loss 6.9991 LearningRate 0.0372 Epoch: 15 Global Step: 88800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:23,249-Speed 11503.22 samples/sec Loss 6.8065 LearningRate 0.0372 Epoch: 15 Global Step: 88810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:24,174-Speed 11073.20 samples/sec Loss 6.9684 LearningRate 0.0371 Epoch: 15 Global Step: 88820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:25,062-Speed 11538.27 samples/sec Loss 7.0885 LearningRate 0.0371 Epoch: 15 Global Step: 88830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:25,969-Speed 11305.52 samples/sec Loss 6.8207 LearningRate 0.0371 Epoch: 15 Global Step: 88840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:26,886-Speed 11173.57 samples/sec Loss 6.8002 LearningRate 0.0371 Epoch: 15 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:27,824-Speed 10921.87 samples/sec Loss 7.0034 LearningRate 0.0371 Epoch: 15 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:28,726-Speed 11363.38 samples/sec Loss 6.9696 LearningRate 0.0371 Epoch: 15 Global Step: 88870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:29,684-Speed 10695.23 samples/sec Loss 6.9821 LearningRate 0.0371 Epoch: 15 Global Step: 88880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:30,600-Speed 11182.93 samples/sec Loss 6.8083 LearningRate 0.0371 Epoch: 15 Global Step: 88890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:31,563-Speed 10654.34 samples/sec Loss 6.9007 LearningRate 0.0371 Epoch: 15 Global Step: 88900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:32,481-Speed 11156.79 samples/sec Loss 6.9977 LearningRate 0.0371 Epoch: 15 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:33,393-Speed 11238.74 samples/sec Loss 6.8432 LearningRate 0.0371 Epoch: 15 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:34,284-Speed 11500.62 samples/sec Loss 6.8473 LearningRate 0.0371 Epoch: 15 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:35,193-Speed 11264.54 samples/sec Loss 6.9328 LearningRate 0.0371 Epoch: 15 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:36,135-Speed 10884.10 samples/sec Loss 6.9093 LearningRate 0.0371 Epoch: 15 Global Step: 88950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:37,052-Speed 11166.48 samples/sec Loss 7.1092 LearningRate 0.0371 Epoch: 15 Global Step: 88960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:37,965-Speed 11224.74 samples/sec Loss 6.9482 LearningRate 0.0371 Epoch: 15 Global Step: 88970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:38,898-Speed 10983.66 samples/sec Loss 6.8727 LearningRate 0.0371 Epoch: 15 Global Step: 88980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:39,860-Speed 10656.51 samples/sec Loss 6.9000 LearningRate 0.0371 Epoch: 15 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:40,798-Speed 10914.68 samples/sec Loss 6.8858 LearningRate 0.0371 Epoch: 15 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:41,761-Speed 10649.62 samples/sec Loss 6.7632 LearningRate 0.0370 Epoch: 15 Global Step: 89010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:42,668-Speed 11300.50 samples/sec Loss 7.0060 LearningRate 0.0370 Epoch: 15 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:43,622-Speed 10736.57 samples/sec Loss 6.9348 LearningRate 0.0370 Epoch: 15 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:44,546-Speed 11083.21 samples/sec Loss 6.8081 LearningRate 0.0370 Epoch: 15 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:45,432-Speed 11569.50 samples/sec Loss 6.9590 LearningRate 0.0370 Epoch: 15 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:46,335-Speed 11351.56 samples/sec Loss 6.9252 LearningRate 0.0370 Epoch: 15 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:47,243-Speed 11279.01 samples/sec Loss 6.8110 LearningRate 0.0370 Epoch: 15 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:48,172-Speed 11034.96 samples/sec Loss 7.0068 LearningRate 0.0370 Epoch: 15 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:49,083-Speed 11244.56 samples/sec Loss 7.0445 LearningRate 0.0370 Epoch: 15 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:50,025-Speed 10881.95 samples/sec Loss 6.9624 LearningRate 0.0370 Epoch: 15 Global Step: 89100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:50,952-Speed 11047.01 samples/sec Loss 7.1079 LearningRate 0.0370 Epoch: 15 Global Step: 89110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:24:51,880-Speed 11047.76 samples/sec Loss 6.8946 LearningRate 0.0370 Epoch: 15 Global Step: 89120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:52,824-Speed 10849.36 samples/sec Loss 6.9211 LearningRate 0.0370 Epoch: 15 Global Step: 89130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:53,753-Speed 11036.18 samples/sec Loss 6.8979 LearningRate 0.0370 Epoch: 15 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:54,682-Speed 11030.55 samples/sec Loss 6.9458 LearningRate 0.0370 Epoch: 15 Global Step: 89150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:55,618-Speed 10953.93 samples/sec Loss 6.9219 LearningRate 0.0370 Epoch: 15 Global Step: 89160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:56,543-Speed 11074.21 samples/sec Loss 6.9754 LearningRate 0.0370 Epoch: 15 Global Step: 89170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:57,435-Speed 11496.30 samples/sec Loss 6.8013 LearningRate 0.0370 Epoch: 15 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:58,348-Speed 11213.37 samples/sec Loss 6.9170 LearningRate 0.0369 Epoch: 15 Global Step: 89190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:24:59,264-Speed 11193.60 samples/sec Loss 6.8732 LearningRate 0.0369 Epoch: 15 Global Step: 89200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:00,159-Speed 11488.81 samples/sec Loss 6.8663 LearningRate 0.0369 Epoch: 15 Global Step: 89210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:01,100-Speed 10882.44 samples/sec Loss 6.9534 LearningRate 0.0369 Epoch: 15 Global Step: 89220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:02,021-Speed 11134.22 samples/sec Loss 7.0990 LearningRate 0.0369 Epoch: 15 Global Step: 89230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:02,956-Speed 10948.48 samples/sec Loss 6.8031 LearningRate 0.0369 Epoch: 15 Global Step: 89240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:03,906-Speed 10795.70 samples/sec Loss 6.9047 LearningRate 0.0369 Epoch: 15 Global Step: 89250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:04,805-Speed 11391.92 samples/sec Loss 6.9995 LearningRate 0.0369 Epoch: 15 Global Step: 89260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:05,751-Speed 10827.45 samples/sec Loss 6.9524 LearningRate 0.0369 Epoch: 15 Global Step: 89270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:06,705-Speed 10743.93 samples/sec Loss 6.9007 LearningRate 0.0369 Epoch: 15 Global Step: 89280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:07,626-Speed 11126.17 samples/sec Loss 6.9541 LearningRate 0.0369 Epoch: 15 Global Step: 89290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:08,553-Speed 11053.93 samples/sec Loss 6.7355 LearningRate 0.0369 Epoch: 15 Global Step: 89300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:09,454-Speed 11373.40 samples/sec Loss 6.7233 LearningRate 0.0369 Epoch: 15 Global Step: 89310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:10,336-Speed 11618.24 samples/sec Loss 6.9196 LearningRate 0.0369 Epoch: 15 Global Step: 89320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:11,256-Speed 11135.64 samples/sec Loss 6.9540 LearningRate 0.0369 Epoch: 15 Global Step: 89330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:12,161-Speed 11319.66 samples/sec Loss 6.9611 LearningRate 0.0369 Epoch: 15 Global Step: 89340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:13,071-Speed 11269.73 samples/sec Loss 6.8873 LearningRate 0.0369 Epoch: 15 Global Step: 89350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:14,003-Speed 10983.34 samples/sec Loss 6.8636 LearningRate 0.0369 Epoch: 15 Global Step: 89360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:14,922-Speed 11156.06 samples/sec Loss 6.8259 LearningRate 0.0369 Epoch: 15 Global Step: 89370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:15,872-Speed 10786.75 samples/sec Loss 6.9048 LearningRate 0.0368 Epoch: 15 Global Step: 89380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:16,813-Speed 10896.38 samples/sec Loss 6.9145 LearningRate 0.0368 Epoch: 15 Global Step: 89390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:17,728-Speed 11189.09 samples/sec Loss 6.7626 LearningRate 0.0368 Epoch: 15 Global Step: 89400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:18,629-Speed 11375.40 samples/sec Loss 7.0047 LearningRate 0.0368 Epoch: 15 Global Step: 89410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:19,551-Speed 11119.10 samples/sec Loss 6.8590 LearningRate 0.0368 Epoch: 15 Global Step: 89420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:20,425-Speed 11714.69 samples/sec Loss 6.9641 LearningRate 0.0368 Epoch: 15 Global Step: 89430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:21,319-Speed 11465.12 samples/sec Loss 7.1151 LearningRate 0.0368 Epoch: 15 Global Step: 89440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:22,216-Speed 11421.36 samples/sec Loss 6.8644 LearningRate 0.0368 Epoch: 15 Global Step: 89450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:23,132-Speed 11192.48 samples/sec Loss 6.9158 LearningRate 0.0368 Epoch: 15 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:24,058-Speed 11081.65 samples/sec Loss 6.9547 LearningRate 0.0368 Epoch: 15 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:24,953-Speed 11448.54 samples/sec Loss 6.9105 LearningRate 0.0368 Epoch: 15 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:25,861-Speed 11282.88 samples/sec Loss 6.8889 LearningRate 0.0368 Epoch: 15 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:26,771-Speed 11258.78 samples/sec Loss 7.0124 LearningRate 0.0368 Epoch: 15 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:27,709-Speed 10931.74 samples/sec Loss 6.9591 LearningRate 0.0368 Epoch: 15 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:28,656-Speed 10812.22 samples/sec Loss 7.0915 LearningRate 0.0368 Epoch: 15 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:29,602-Speed 10835.94 samples/sec Loss 6.8778 LearningRate 0.0368 Epoch: 15 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:30,500-Speed 11413.21 samples/sec Loss 6.9744 LearningRate 0.0368 Epoch: 15 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:31,443-Speed 10865.70 samples/sec Loss 7.0235 LearningRate 0.0368 Epoch: 15 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:32,650-Speed 8491.34 samples/sec Loss 6.9308 LearningRate 0.0368 Epoch: 15 Global Step: 89560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:33,925-Speed 8034.36 samples/sec Loss 6.8520 LearningRate 0.0367 Epoch: 15 Global Step: 89570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:34,822-Speed 11426.78 samples/sec Loss 7.0527 LearningRate 0.0367 Epoch: 15 Global Step: 89580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:36,125-Speed 7864.80 samples/sec Loss 6.9795 LearningRate 0.0367 Epoch: 15 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:37,694-Speed 6528.25 samples/sec Loss 6.8776 LearningRate 0.0367 Epoch: 15 Global Step: 89600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:38,629-Speed 10960.14 samples/sec Loss 6.9600 LearningRate 0.0367 Epoch: 15 Global Step: 89610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:39,949-Speed 7763.71 samples/sec Loss 6.8695 LearningRate 0.0367 Epoch: 15 Global Step: 89620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:41,181-Speed 8315.97 samples/sec Loss 6.8606 LearningRate 0.0367 Epoch: 15 Global Step: 89630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:42,080-Speed 11391.11 samples/sec Loss 6.7818 LearningRate 0.0367 Epoch: 15 Global Step: 89640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:43,004-Speed 11093.37 samples/sec Loss 7.0313 LearningRate 0.0367 Epoch: 15 Global Step: 89650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:43,942-Speed 10930.17 samples/sec Loss 6.9190 LearningRate 0.0367 Epoch: 15 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:44,831-Speed 11522.54 samples/sec Loss 6.9902 LearningRate 0.0367 Epoch: 15 Global Step: 89670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:45,721-Speed 11515.83 samples/sec Loss 7.0055 LearningRate 0.0367 Epoch: 15 Global Step: 89680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:46,620-Speed 11396.10 samples/sec Loss 6.8706 LearningRate 0.0367 Epoch: 15 Global Step: 89690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:47,544-Speed 11082.14 samples/sec Loss 6.8607 LearningRate 0.0367 Epoch: 15 Global Step: 89700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:48,424-Speed 11652.80 samples/sec Loss 6.9339 LearningRate 0.0367 Epoch: 15 Global Step: 89710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:49,356-Speed 10990.36 samples/sec Loss 6.8923 LearningRate 0.0367 Epoch: 15 Global Step: 89720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:50,309-Speed 10752.21 samples/sec Loss 6.9972 LearningRate 0.0367 Epoch: 15 Global Step: 89730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:51,243-Speed 10976.05 samples/sec Loss 7.0200 LearningRate 0.0367 Epoch: 15 Global Step: 89740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:52,141-Speed 11409.13 samples/sec Loss 6.7800 LearningRate 0.0366 Epoch: 15 Global Step: 89750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:53,036-Speed 11444.72 samples/sec Loss 7.0886 LearningRate 0.0366 Epoch: 15 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:53,967-Speed 11009.58 samples/sec Loss 6.9460 LearningRate 0.0366 Epoch: 15 Global Step: 89770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:54,890-Speed 11101.08 samples/sec Loss 6.8110 LearningRate 0.0366 Epoch: 15 Global Step: 89780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:55,808-Speed 11156.56 samples/sec Loss 6.9469 LearningRate 0.0366 Epoch: 15 Global Step: 89790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:56,706-Speed 11414.55 samples/sec Loss 6.9654 LearningRate 0.0366 Epoch: 15 Global Step: 89800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:25:57,594-Speed 11540.30 samples/sec Loss 6.8389 LearningRate 0.0366 Epoch: 15 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:58,493-Speed 11398.10 samples/sec Loss 6.8110 LearningRate 0.0366 Epoch: 15 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:25:59,441-Speed 10816.54 samples/sec Loss 6.7969 LearningRate 0.0366 Epoch: 15 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:00,338-Speed 11418.82 samples/sec Loss 6.9786 LearningRate 0.0366 Epoch: 15 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:01,300-Speed 10655.82 samples/sec Loss 6.9922 LearningRate 0.0366 Epoch: 15 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:02,290-Speed 10350.24 samples/sec Loss 6.8899 LearningRate 0.0366 Epoch: 15 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:03,195-Speed 11319.32 samples/sec Loss 6.8530 LearningRate 0.0366 Epoch: 15 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:04,131-Speed 10951.46 samples/sec Loss 6.9044 LearningRate 0.0366 Epoch: 15 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:05,083-Speed 10764.06 samples/sec Loss 6.8405 LearningRate 0.0366 Epoch: 15 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:05,995-Speed 11232.28 samples/sec Loss 6.9293 LearningRate 0.0366 Epoch: 15 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:06,907-Speed 11234.38 samples/sec Loss 6.8560 LearningRate 0.0366 Epoch: 15 Global Step: 89910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:07,811-Speed 11333.09 samples/sec Loss 6.8888 LearningRate 0.0366 Epoch: 15 Global Step: 89920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:08,708-Speed 11418.55 samples/sec Loss 6.8250 LearningRate 0.0366 Epoch: 15 Global Step: 89930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:09,596-Speed 11548.88 samples/sec Loss 6.8632 LearningRate 0.0365 Epoch: 15 Global Step: 89940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:10,473-Speed 11683.95 samples/sec Loss 6.9741 LearningRate 0.0365 Epoch: 15 Global Step: 89950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:11,350-Speed 11682.51 samples/sec Loss 7.0622 LearningRate 0.0365 Epoch: 15 Global Step: 89960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:12,276-Speed 11063.57 samples/sec Loss 6.9981 LearningRate 0.0365 Epoch: 15 Global Step: 89970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:13,201-Speed 11078.92 samples/sec Loss 6.7324 LearningRate 0.0365 Epoch: 15 Global Step: 89980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:26:14,074-Speed 11730.85 samples/sec Loss 6.9918 LearningRate 0.0365 Epoch: 15 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:14,951-Speed 11695.97 samples/sec Loss 6.8743 LearningRate 0.0365 Epoch: 15 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:26:36,228-[lfw][90000]XNorm: 11.501626 Training: 2022-04-11 02:26:36,229-[lfw][90000]Accuracy-Flip: 0.99467+-0.00371 Training: 2022-04-11 02:26:36,229-[lfw][90000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:27:00,881-[cfp_fp][90000]XNorm: 9.646237 Training: 2022-04-11 02:27:00,881-[cfp_fp][90000]Accuracy-Flip: 0.94386+-0.01010 Training: 2022-04-11 02:27:00,882-[cfp_fp][90000]Accuracy-Highest: 0.94386 Training: 2022-04-11 02:27:22,063-[agedb_30][90000]XNorm: 11.295019 Training: 2022-04-11 02:27:22,063-[agedb_30][90000]Accuracy-Flip: 0.96483+-0.01015 Training: 2022-04-11 02:27:22,064-[agedb_30][90000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:27:22,968-Speed 150.55 samples/sec Loss 7.0566 LearningRate 0.0365 Epoch: 15 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:23,867-Speed 11404.34 samples/sec Loss 7.0390 LearningRate 0.0365 Epoch: 15 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:24,767-Speed 11378.35 samples/sec Loss 6.9739 LearningRate 0.0365 Epoch: 15 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:25,656-Speed 11526.58 samples/sec Loss 6.7676 LearningRate 0.0365 Epoch: 15 Global Step: 90040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:26,545-Speed 11527.69 samples/sec Loss 6.8254 LearningRate 0.0365 Epoch: 15 Global Step: 90050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:27,455-Speed 11263.82 samples/sec Loss 6.9008 LearningRate 0.0365 Epoch: 15 Global Step: 90060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:28,407-Speed 10764.41 samples/sec Loss 7.1234 LearningRate 0.0365 Epoch: 15 Global Step: 90070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:29,324-Speed 11177.37 samples/sec Loss 6.8990 LearningRate 0.0365 Epoch: 15 Global Step: 90080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:30,258-Speed 10971.71 samples/sec Loss 7.0205 LearningRate 0.0365 Epoch: 15 Global Step: 90090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:31,173-Speed 11192.09 samples/sec Loss 7.0293 LearningRate 0.0365 Epoch: 15 Global Step: 90100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:32,088-Speed 11201.46 samples/sec Loss 6.9894 LearningRate 0.0365 Epoch: 15 Global Step: 90110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:32,964-Speed 11700.08 samples/sec Loss 6.9217 LearningRate 0.0365 Epoch: 15 Global Step: 90120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:33,854-Speed 11514.54 samples/sec Loss 6.9767 LearningRate 0.0364 Epoch: 15 Global Step: 90130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:34,732-Speed 11677.02 samples/sec Loss 6.8421 LearningRate 0.0364 Epoch: 15 Global Step: 90140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:35,673-Speed 10889.60 samples/sec Loss 7.0739 LearningRate 0.0364 Epoch: 15 Global Step: 90150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:36,558-Speed 11571.32 samples/sec Loss 6.8773 LearningRate 0.0364 Epoch: 15 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:37,490-Speed 10994.51 samples/sec Loss 7.0309 LearningRate 0.0364 Epoch: 15 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:38,388-Speed 11414.87 samples/sec Loss 6.7868 LearningRate 0.0364 Epoch: 15 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:39,266-Speed 11663.12 samples/sec Loss 6.9097 LearningRate 0.0364 Epoch: 15 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:40,167-Speed 11372.36 samples/sec Loss 6.6996 LearningRate 0.0364 Epoch: 15 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:41,093-Speed 11064.44 samples/sec Loss 6.8534 LearningRate 0.0364 Epoch: 15 Global Step: 90210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:42,018-Speed 11086.76 samples/sec Loss 6.9952 LearningRate 0.0364 Epoch: 15 Global Step: 90220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:42,922-Speed 11334.09 samples/sec Loss 6.8920 LearningRate 0.0364 Epoch: 15 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:43,804-Speed 11609.82 samples/sec Loss 7.0200 LearningRate 0.0364 Epoch: 15 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:44,686-Speed 11629.17 samples/sec Loss 6.8091 LearningRate 0.0364 Epoch: 15 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:45,563-Speed 11680.44 samples/sec Loss 6.8012 LearningRate 0.0364 Epoch: 15 Global Step: 90260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:46,435-Speed 11746.93 samples/sec Loss 6.9814 LearningRate 0.0364 Epoch: 15 Global Step: 90270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:47,321-Speed 11567.75 samples/sec Loss 6.9864 LearningRate 0.0364 Epoch: 15 Global Step: 90280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:48,223-Speed 11363.71 samples/sec Loss 6.8350 LearningRate 0.0364 Epoch: 15 Global Step: 90290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:49,141-Speed 11159.94 samples/sec Loss 6.9491 LearningRate 0.0364 Epoch: 15 Global Step: 90300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:50,041-Speed 11391.45 samples/sec Loss 6.9946 LearningRate 0.0364 Epoch: 15 Global Step: 90310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:50,978-Speed 10937.30 samples/sec Loss 6.8121 LearningRate 0.0363 Epoch: 15 Global Step: 90320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:51,905-Speed 11044.16 samples/sec Loss 6.9088 LearningRate 0.0363 Epoch: 15 Global Step: 90330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:52,806-Speed 11376.96 samples/sec Loss 6.9595 LearningRate 0.0363 Epoch: 15 Global Step: 90340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:53,698-Speed 11491.45 samples/sec Loss 6.9911 LearningRate 0.0363 Epoch: 15 Global Step: 90350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:54,573-Speed 11711.48 samples/sec Loss 6.9331 LearningRate 0.0363 Epoch: 15 Global Step: 90360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:27:55,447-Speed 11727.00 samples/sec Loss 6.9065 LearningRate 0.0363 Epoch: 15 Global Step: 90370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:56,339-Speed 11489.02 samples/sec Loss 6.9611 LearningRate 0.0363 Epoch: 15 Global Step: 90380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:57,237-Speed 11409.17 samples/sec Loss 6.8068 LearningRate 0.0363 Epoch: 15 Global Step: 90390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:58,113-Speed 11696.60 samples/sec Loss 6.8675 LearningRate 0.0363 Epoch: 15 Global Step: 90400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:27:58,992-Speed 11650.05 samples/sec Loss 6.9430 LearningRate 0.0363 Epoch: 15 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:27:59,890-Speed 11417.41 samples/sec Loss 7.0064 LearningRate 0.0363 Epoch: 15 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:00,849-Speed 10686.17 samples/sec Loss 6.9496 LearningRate 0.0363 Epoch: 15 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:01,728-Speed 11655.11 samples/sec Loss 6.8188 LearningRate 0.0363 Epoch: 15 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:02,645-Speed 11170.80 samples/sec Loss 6.9643 LearningRate 0.0363 Epoch: 15 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:03,532-Speed 11564.37 samples/sec Loss 6.9368 LearningRate 0.0363 Epoch: 15 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:04,427-Speed 11449.36 samples/sec Loss 6.9949 LearningRate 0.0363 Epoch: 15 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:05,330-Speed 11340.79 samples/sec Loss 6.9325 LearningRate 0.0363 Epoch: 15 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:06,204-Speed 11729.73 samples/sec Loss 6.9157 LearningRate 0.0363 Epoch: 15 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:07,096-Speed 11492.40 samples/sec Loss 6.9260 LearningRate 0.0363 Epoch: 15 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:07,978-Speed 11605.82 samples/sec Loss 6.9652 LearningRate 0.0362 Epoch: 15 Global Step: 90510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:08,862-Speed 11598.13 samples/sec Loss 6.9493 LearningRate 0.0362 Epoch: 15 Global Step: 90520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:09,744-Speed 11617.46 samples/sec Loss 6.8660 LearningRate 0.0362 Epoch: 15 Global Step: 90530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:10,636-Speed 11483.88 samples/sec Loss 6.9148 LearningRate 0.0362 Epoch: 15 Global Step: 90540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:11,546-Speed 11266.22 samples/sec Loss 6.9167 LearningRate 0.0362 Epoch: 15 Global Step: 90550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:12,469-Speed 11103.10 samples/sec Loss 7.0179 LearningRate 0.0362 Epoch: 15 Global Step: 90560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:13,380-Speed 11252.32 samples/sec Loss 6.8702 LearningRate 0.0362 Epoch: 15 Global Step: 90570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:14,317-Speed 10939.02 samples/sec Loss 6.9661 LearningRate 0.0362 Epoch: 15 Global Step: 90580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:15,276-Speed 10679.27 samples/sec Loss 6.9548 LearningRate 0.0362 Epoch: 15 Global Step: 90590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:16,172-Speed 11430.27 samples/sec Loss 6.9850 LearningRate 0.0362 Epoch: 15 Global Step: 90600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:17,107-Speed 10969.89 samples/sec Loss 7.0085 LearningRate 0.0362 Epoch: 15 Global Step: 90610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:18,068-Speed 10665.00 samples/sec Loss 6.9089 LearningRate 0.0362 Epoch: 15 Global Step: 90620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:19,029-Speed 10661.91 samples/sec Loss 6.8027 LearningRate 0.0362 Epoch: 15 Global Step: 90630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:19,971-Speed 10874.04 samples/sec Loss 6.9705 LearningRate 0.0362 Epoch: 15 Global Step: 90640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:20,861-Speed 11519.39 samples/sec Loss 6.9492 LearningRate 0.0362 Epoch: 15 Global Step: 90650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:21,733-Speed 11760.53 samples/sec Loss 6.9366 LearningRate 0.0362 Epoch: 15 Global Step: 90660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:22,614-Speed 11635.10 samples/sec Loss 6.8779 LearningRate 0.0362 Epoch: 15 Global Step: 90670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:23,511-Speed 11426.11 samples/sec Loss 6.9324 LearningRate 0.0362 Epoch: 15 Global Step: 90680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:24,416-Speed 11316.59 samples/sec Loss 6.9046 LearningRate 0.0362 Epoch: 15 Global Step: 90690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:25,290-Speed 11731.63 samples/sec Loss 6.9239 LearningRate 0.0361 Epoch: 15 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:26,170-Speed 11645.23 samples/sec Loss 6.8938 LearningRate 0.0361 Epoch: 15 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:27,101-Speed 10998.05 samples/sec Loss 6.9515 LearningRate 0.0361 Epoch: 15 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:28,057-Speed 10723.75 samples/sec Loss 6.9989 LearningRate 0.0361 Epoch: 15 Global Step: 90730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:28,932-Speed 11709.07 samples/sec Loss 6.8212 LearningRate 0.0361 Epoch: 15 Global Step: 90740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:29,834-Speed 11366.38 samples/sec Loss 7.0543 LearningRate 0.0361 Epoch: 15 Global Step: 90750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:30,785-Speed 10772.37 samples/sec Loss 6.8840 LearningRate 0.0361 Epoch: 15 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:31,691-Speed 11313.86 samples/sec Loss 7.0069 LearningRate 0.0361 Epoch: 15 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:32,579-Speed 11529.53 samples/sec Loss 6.9107 LearningRate 0.0361 Epoch: 15 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:33,463-Speed 11602.24 samples/sec Loss 6.7675 LearningRate 0.0361 Epoch: 15 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:34,338-Speed 11709.44 samples/sec Loss 6.9738 LearningRate 0.0361 Epoch: 15 Global Step: 90800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:35,249-Speed 11238.37 samples/sec Loss 6.8685 LearningRate 0.0361 Epoch: 15 Global Step: 90810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:36,149-Speed 11385.71 samples/sec Loss 6.9108 LearningRate 0.0361 Epoch: 15 Global Step: 90820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:37,081-Speed 11003.33 samples/sec Loss 6.9541 LearningRate 0.0361 Epoch: 15 Global Step: 90830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:38,011-Speed 11010.23 samples/sec Loss 6.8020 LearningRate 0.0361 Epoch: 15 Global Step: 90840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:38,920-Speed 11280.26 samples/sec Loss 7.0038 LearningRate 0.0361 Epoch: 15 Global Step: 90850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:39,833-Speed 11213.64 samples/sec Loss 6.9782 LearningRate 0.0361 Epoch: 15 Global Step: 90860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:40,712-Speed 11667.93 samples/sec Loss 7.0089 LearningRate 0.0361 Epoch: 15 Global Step: 90870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:41,591-Speed 11656.19 samples/sec Loss 6.8176 LearningRate 0.0361 Epoch: 15 Global Step: 90880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:28:42,528-Speed 10929.08 samples/sec Loss 6.9802 LearningRate 0.0360 Epoch: 15 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:43,422-Speed 11462.76 samples/sec Loss 7.1094 LearningRate 0.0360 Epoch: 15 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:44,351-Speed 11030.78 samples/sec Loss 6.7471 LearningRate 0.0360 Epoch: 15 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:45,236-Speed 11589.57 samples/sec Loss 6.9245 LearningRate 0.0360 Epoch: 15 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:46,116-Speed 11634.35 samples/sec Loss 7.0680 LearningRate 0.0360 Epoch: 15 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:47,009-Speed 11482.00 samples/sec Loss 7.0242 LearningRate 0.0360 Epoch: 15 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:47,885-Speed 11702.50 samples/sec Loss 6.9193 LearningRate 0.0360 Epoch: 15 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:48,793-Speed 11282.70 samples/sec Loss 6.8859 LearningRate 0.0360 Epoch: 15 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:49,792-Speed 10258.35 samples/sec Loss 6.8709 LearningRate 0.0360 Epoch: 15 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:28:59,984-Speed 1004.82 samples/sec Loss 6.6948 LearningRate 0.0360 Epoch: 16 Global Step: 90980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:00,936-Speed 10761.71 samples/sec Loss 6.2757 LearningRate 0.0360 Epoch: 16 Global Step: 90990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:01,913-Speed 10488.18 samples/sec Loss 6.1586 LearningRate 0.0360 Epoch: 16 Global Step: 91000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:02,870-Speed 10711.29 samples/sec Loss 6.2608 LearningRate 0.0360 Epoch: 16 Global Step: 91010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:03,911-Speed 9845.49 samples/sec Loss 6.2982 LearningRate 0.0360 Epoch: 16 Global Step: 91020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:04,848-Speed 10930.59 samples/sec Loss 6.1239 LearningRate 0.0360 Epoch: 16 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:05,788-Speed 10911.97 samples/sec Loss 6.2750 LearningRate 0.0360 Epoch: 16 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:06,751-Speed 10634.00 samples/sec Loss 6.1587 LearningRate 0.0360 Epoch: 16 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:07,685-Speed 10976.17 samples/sec Loss 6.2443 LearningRate 0.0360 Epoch: 16 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:08,603-Speed 11164.98 samples/sec Loss 6.2233 LearningRate 0.0360 Epoch: 16 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:09,570-Speed 10593.26 samples/sec Loss 6.2186 LearningRate 0.0359 Epoch: 16 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:10,468-Speed 11419.53 samples/sec Loss 6.1014 LearningRate 0.0359 Epoch: 16 Global Step: 91090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:11,343-Speed 11708.68 samples/sec Loss 6.2335 LearningRate 0.0359 Epoch: 16 Global Step: 91100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:12,347-Speed 10209.05 samples/sec Loss 6.1968 LearningRate 0.0359 Epoch: 16 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:13,286-Speed 10911.13 samples/sec Loss 6.2527 LearningRate 0.0359 Epoch: 16 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:14,193-Speed 11299.93 samples/sec Loss 6.3386 LearningRate 0.0359 Epoch: 16 Global Step: 91130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:15,107-Speed 11203.23 samples/sec Loss 6.2104 LearningRate 0.0359 Epoch: 16 Global Step: 91140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:16,023-Speed 11189.37 samples/sec Loss 6.2471 LearningRate 0.0359 Epoch: 16 Global Step: 91150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:16,958-Speed 10958.82 samples/sec Loss 6.2230 LearningRate 0.0359 Epoch: 16 Global Step: 91160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:17,864-Speed 11311.57 samples/sec Loss 6.2355 LearningRate 0.0359 Epoch: 16 Global Step: 91170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:18,784-Speed 11132.15 samples/sec Loss 6.2041 LearningRate 0.0359 Epoch: 16 Global Step: 91180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:19,743-Speed 10696.03 samples/sec Loss 6.3525 LearningRate 0.0359 Epoch: 16 Global Step: 91190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:20,673-Speed 11016.86 samples/sec Loss 6.3208 LearningRate 0.0359 Epoch: 16 Global Step: 91200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:21,546-Speed 11739.20 samples/sec Loss 6.4079 LearningRate 0.0359 Epoch: 16 Global Step: 91210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:22,426-Speed 11644.38 samples/sec Loss 6.1804 LearningRate 0.0359 Epoch: 16 Global Step: 91220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:23,340-Speed 11214.59 samples/sec Loss 6.2444 LearningRate 0.0359 Epoch: 16 Global Step: 91230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:24,226-Speed 11565.81 samples/sec Loss 6.5517 LearningRate 0.0359 Epoch: 16 Global Step: 91240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:25,115-Speed 11523.25 samples/sec Loss 6.4474 LearningRate 0.0359 Epoch: 16 Global Step: 91250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:26,046-Speed 11013.06 samples/sec Loss 6.3440 LearningRate 0.0359 Epoch: 16 Global Step: 91260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:26,988-Speed 10877.15 samples/sec Loss 6.2428 LearningRate 0.0358 Epoch: 16 Global Step: 91270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:27,906-Speed 11165.72 samples/sec Loss 6.2778 LearningRate 0.0358 Epoch: 16 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:28,793-Speed 11560.53 samples/sec Loss 6.3175 LearningRate 0.0358 Epoch: 16 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:29,702-Speed 11273.07 samples/sec Loss 6.3588 LearningRate 0.0358 Epoch: 16 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:30,639-Speed 10940.84 samples/sec Loss 6.4207 LearningRate 0.0358 Epoch: 16 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:31,571-Speed 10986.19 samples/sec Loss 6.5113 LearningRate 0.0358 Epoch: 16 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:32,470-Speed 11406.55 samples/sec Loss 6.3820 LearningRate 0.0358 Epoch: 16 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:33,374-Speed 11323.00 samples/sec Loss 6.4254 LearningRate 0.0358 Epoch: 16 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:34,251-Speed 11690.30 samples/sec Loss 6.3983 LearningRate 0.0358 Epoch: 16 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:35,136-Speed 11573.60 samples/sec Loss 6.3308 LearningRate 0.0358 Epoch: 16 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:36,069-Speed 10988.27 samples/sec Loss 6.3717 LearningRate 0.0358 Epoch: 16 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:36,978-Speed 11279.59 samples/sec Loss 6.3834 LearningRate 0.0358 Epoch: 16 Global Step: 91380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:37,876-Speed 11415.11 samples/sec Loss 6.4008 LearningRate 0.0358 Epoch: 16 Global Step: 91390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:38,796-Speed 11129.23 samples/sec Loss 6.3820 LearningRate 0.0358 Epoch: 16 Global Step: 91400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:39,727-Speed 11011.27 samples/sec Loss 6.4834 LearningRate 0.0358 Epoch: 16 Global Step: 91410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:40,651-Speed 11082.59 samples/sec Loss 6.5033 LearningRate 0.0358 Epoch: 16 Global Step: 91420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:41,621-Speed 10571.40 samples/sec Loss 6.5364 LearningRate 0.0358 Epoch: 16 Global Step: 91430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:42,533-Speed 11235.93 samples/sec Loss 6.4522 LearningRate 0.0358 Epoch: 16 Global Step: 91440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:43,455-Speed 11113.98 samples/sec Loss 6.3224 LearningRate 0.0358 Epoch: 16 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:44,375-Speed 11136.47 samples/sec Loss 6.3491 LearningRate 0.0357 Epoch: 16 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:45,265-Speed 11515.74 samples/sec Loss 6.4925 LearningRate 0.0357 Epoch: 16 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:46,188-Speed 11093.43 samples/sec Loss 6.4527 LearningRate 0.0357 Epoch: 16 Global Step: 91480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:47,070-Speed 11627.46 samples/sec Loss 6.4099 LearningRate 0.0357 Epoch: 16 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:47,990-Speed 11131.18 samples/sec Loss 6.5824 LearningRate 0.0357 Epoch: 16 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:48,910-Speed 11134.96 samples/sec Loss 6.4051 LearningRate 0.0357 Epoch: 16 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:49,837-Speed 11057.74 samples/sec Loss 6.4951 LearningRate 0.0357 Epoch: 16 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:50,798-Speed 10665.67 samples/sec Loss 6.4288 LearningRate 0.0357 Epoch: 16 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:51,713-Speed 11199.12 samples/sec Loss 6.4673 LearningRate 0.0357 Epoch: 16 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:52,609-Speed 11429.91 samples/sec Loss 6.4622 LearningRate 0.0357 Epoch: 16 Global Step: 91550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:53,569-Speed 10673.25 samples/sec Loss 6.4842 LearningRate 0.0357 Epoch: 16 Global Step: 91560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:29:54,497-Speed 11044.54 samples/sec Loss 6.4182 LearningRate 0.0357 Epoch: 16 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:55,400-Speed 11351.67 samples/sec Loss 6.5300 LearningRate 0.0357 Epoch: 16 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:56,302-Speed 11363.07 samples/sec Loss 6.4949 LearningRate 0.0357 Epoch: 16 Global Step: 91590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:57,192-Speed 11511.04 samples/sec Loss 6.4936 LearningRate 0.0357 Epoch: 16 Global Step: 91600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:58,076-Speed 11595.57 samples/sec Loss 6.5367 LearningRate 0.0357 Epoch: 16 Global Step: 91610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:58,953-Speed 11680.34 samples/sec Loss 6.4196 LearningRate 0.0357 Epoch: 16 Global Step: 91620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:29:59,878-Speed 11079.02 samples/sec Loss 6.4003 LearningRate 0.0357 Epoch: 16 Global Step: 91630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:00,809-Speed 11011.07 samples/sec Loss 6.5867 LearningRate 0.0357 Epoch: 16 Global Step: 91640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:01,710-Speed 11370.95 samples/sec Loss 6.6367 LearningRate 0.0356 Epoch: 16 Global Step: 91650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:02,658-Speed 10799.85 samples/sec Loss 6.4898 LearningRate 0.0356 Epoch: 16 Global Step: 91660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:03,560-Speed 11370.74 samples/sec Loss 6.4949 LearningRate 0.0356 Epoch: 16 Global Step: 91670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:04,489-Speed 11028.82 samples/sec Loss 6.3831 LearningRate 0.0356 Epoch: 16 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:05,377-Speed 11539.74 samples/sec Loss 6.7210 LearningRate 0.0356 Epoch: 16 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:06,255-Speed 11672.52 samples/sec Loss 6.6014 LearningRate 0.0356 Epoch: 16 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:07,174-Speed 11148.35 samples/sec Loss 6.4955 LearningRate 0.0356 Epoch: 16 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:08,089-Speed 11193.04 samples/sec Loss 6.4128 LearningRate 0.0356 Epoch: 16 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:08,968-Speed 11663.11 samples/sec Loss 6.5088 LearningRate 0.0356 Epoch: 16 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:09,856-Speed 11537.83 samples/sec Loss 6.5946 LearningRate 0.0356 Epoch: 16 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:10,729-Speed 11734.98 samples/sec Loss 6.6198 LearningRate 0.0356 Epoch: 16 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:11,606-Speed 11683.03 samples/sec Loss 6.6548 LearningRate 0.0356 Epoch: 16 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:12,505-Speed 11399.77 samples/sec Loss 6.4888 LearningRate 0.0356 Epoch: 16 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:13,434-Speed 11025.74 samples/sec Loss 6.4538 LearningRate 0.0356 Epoch: 16 Global Step: 91780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:14,346-Speed 11243.29 samples/sec Loss 6.4729 LearningRate 0.0356 Epoch: 16 Global Step: 91790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:15,291-Speed 10839.92 samples/sec Loss 6.5369 LearningRate 0.0356 Epoch: 16 Global Step: 91800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:16,226-Speed 10955.97 samples/sec Loss 6.7190 LearningRate 0.0356 Epoch: 16 Global Step: 91810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:17,180-Speed 10748.30 samples/sec Loss 6.5627 LearningRate 0.0356 Epoch: 16 Global Step: 91820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:18,099-Speed 11149.85 samples/sec Loss 6.5312 LearningRate 0.0356 Epoch: 16 Global Step: 91830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:19,019-Speed 11142.04 samples/sec Loss 6.5239 LearningRate 0.0355 Epoch: 16 Global Step: 91840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:19,979-Speed 10673.55 samples/sec Loss 6.4210 LearningRate 0.0355 Epoch: 16 Global Step: 91850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:20,870-Speed 11498.90 samples/sec Loss 6.5970 LearningRate 0.0355 Epoch: 16 Global Step: 91860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:30:21,733-Speed 11867.81 samples/sec Loss 6.6707 LearningRate 0.0355 Epoch: 16 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:22,623-Speed 11520.66 samples/sec Loss 6.4638 LearningRate 0.0355 Epoch: 16 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:23,500-Speed 11683.90 samples/sec Loss 6.6301 LearningRate 0.0355 Epoch: 16 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:24,417-Speed 11172.04 samples/sec Loss 6.5601 LearningRate 0.0355 Epoch: 16 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:25,316-Speed 11398.15 samples/sec Loss 6.6126 LearningRate 0.0355 Epoch: 16 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:26,240-Speed 11097.35 samples/sec Loss 6.6443 LearningRate 0.0355 Epoch: 16 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:27,157-Speed 11162.64 samples/sec Loss 6.5183 LearningRate 0.0355 Epoch: 16 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:28,092-Speed 10960.08 samples/sec Loss 6.5856 LearningRate 0.0355 Epoch: 16 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:29,041-Speed 10798.69 samples/sec Loss 6.5596 LearningRate 0.0355 Epoch: 16 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:29,984-Speed 10873.61 samples/sec Loss 6.6907 LearningRate 0.0355 Epoch: 16 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:30,878-Speed 11457.29 samples/sec Loss 6.5849 LearningRate 0.0355 Epoch: 16 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:31,822-Speed 10850.31 samples/sec Loss 6.5536 LearningRate 0.0355 Epoch: 16 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:32,725-Speed 11349.24 samples/sec Loss 6.7093 LearningRate 0.0355 Epoch: 16 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:33,615-Speed 11515.71 samples/sec Loss 6.7177 LearningRate 0.0355 Epoch: 16 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:30:54,954-[lfw][92000]XNorm: 11.501438 Training: 2022-04-11 02:30:54,955-[lfw][92000]Accuracy-Flip: 0.99600+-0.00382 Training: 2022-04-11 02:30:54,955-[lfw][92000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:31:19,513-[cfp_fp][92000]XNorm: 9.622866 Training: 2022-04-11 02:31:19,514-[cfp_fp][92000]Accuracy-Flip: 0.94357+-0.01177 Training: 2022-04-11 02:31:19,514-[cfp_fp][92000]Accuracy-Highest: 0.94386 Training: 2022-04-11 02:31:40,820-[agedb_30][92000]XNorm: 11.230503 Training: 2022-04-11 02:31:40,820-[agedb_30][92000]Accuracy-Flip: 0.96267+-0.00857 Training: 2022-04-11 02:31:40,820-[agedb_30][92000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:31:41,739-Speed 150.31 samples/sec Loss 6.4999 LearningRate 0.0355 Epoch: 16 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:42,666-Speed 11055.31 samples/sec Loss 6.5205 LearningRate 0.0355 Epoch: 16 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:43,563-Speed 11428.59 samples/sec Loss 6.5239 LearningRate 0.0354 Epoch: 16 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:44,482-Speed 11143.46 samples/sec Loss 6.5704 LearningRate 0.0354 Epoch: 16 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:45,383-Speed 11382.49 samples/sec Loss 6.6009 LearningRate 0.0354 Epoch: 16 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:46,263-Speed 11641.54 samples/sec Loss 6.5019 LearningRate 0.0354 Epoch: 16 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:47,141-Speed 11662.23 samples/sec Loss 6.5273 LearningRate 0.0354 Epoch: 16 Global Step: 92070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:48,069-Speed 11057.88 samples/sec Loss 6.4695 LearningRate 0.0354 Epoch: 16 Global Step: 92080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:48,976-Speed 11291.08 samples/sec Loss 6.5640 LearningRate 0.0354 Epoch: 16 Global Step: 92090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:49,899-Speed 11108.19 samples/sec Loss 6.6744 LearningRate 0.0354 Epoch: 16 Global Step: 92100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:50,815-Speed 11187.16 samples/sec Loss 6.5933 LearningRate 0.0354 Epoch: 16 Global Step: 92110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:51,780-Speed 10617.24 samples/sec Loss 6.6010 LearningRate 0.0354 Epoch: 16 Global Step: 92120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:52,721-Speed 10886.25 samples/sec Loss 6.7734 LearningRate 0.0354 Epoch: 16 Global Step: 92130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:31:53,649-Speed 11040.37 samples/sec Loss 6.5609 LearningRate 0.0354 Epoch: 16 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:54,563-Speed 11210.30 samples/sec Loss 6.6733 LearningRate 0.0354 Epoch: 16 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:55,452-Speed 11524.84 samples/sec Loss 6.6320 LearningRate 0.0354 Epoch: 16 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:56,345-Speed 11480.82 samples/sec Loss 6.6813 LearningRate 0.0354 Epoch: 16 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:57,224-Speed 11650.02 samples/sec Loss 6.4931 LearningRate 0.0354 Epoch: 16 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:58,122-Speed 11417.46 samples/sec Loss 6.6410 LearningRate 0.0354 Epoch: 16 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:59,000-Speed 11662.33 samples/sec Loss 6.5458 LearningRate 0.0354 Epoch: 16 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:31:59,893-Speed 11475.45 samples/sec Loss 6.6420 LearningRate 0.0354 Epoch: 16 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:00,793-Speed 11385.35 samples/sec Loss 6.7573 LearningRate 0.0353 Epoch: 16 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:01,753-Speed 10682.74 samples/sec Loss 6.4760 LearningRate 0.0353 Epoch: 16 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:02,631-Speed 11666.04 samples/sec Loss 6.8260 LearningRate 0.0353 Epoch: 16 Global Step: 92240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:03,570-Speed 10905.72 samples/sec Loss 6.6702 LearningRate 0.0353 Epoch: 16 Global Step: 92250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:04,490-Speed 11150.71 samples/sec Loss 6.6294 LearningRate 0.0353 Epoch: 16 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:05,420-Speed 11021.65 samples/sec Loss 6.5710 LearningRate 0.0353 Epoch: 16 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:06,351-Speed 10998.39 samples/sec Loss 6.4836 LearningRate 0.0353 Epoch: 16 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:07,264-Speed 11227.69 samples/sec Loss 6.6084 LearningRate 0.0353 Epoch: 16 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:08,179-Speed 11194.41 samples/sec Loss 6.7424 LearningRate 0.0353 Epoch: 16 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:09,077-Speed 11418.90 samples/sec Loss 6.6054 LearningRate 0.0353 Epoch: 16 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:09,969-Speed 11487.83 samples/sec Loss 6.6240 LearningRate 0.0353 Epoch: 16 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:10,871-Speed 11361.55 samples/sec Loss 6.7922 LearningRate 0.0353 Epoch: 16 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:11,798-Speed 11051.00 samples/sec Loss 6.5516 LearningRate 0.0353 Epoch: 16 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:12,737-Speed 10906.47 samples/sec Loss 6.5514 LearningRate 0.0353 Epoch: 16 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:13,692-Speed 10737.91 samples/sec Loss 6.5595 LearningRate 0.0353 Epoch: 16 Global Step: 92360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:14,578-Speed 11565.29 samples/sec Loss 6.7423 LearningRate 0.0353 Epoch: 16 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:15,504-Speed 11056.83 samples/sec Loss 6.7374 LearningRate 0.0353 Epoch: 16 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:16,450-Speed 10829.40 samples/sec Loss 6.7661 LearningRate 0.0353 Epoch: 16 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:17,372-Speed 11114.05 samples/sec Loss 6.7208 LearningRate 0.0353 Epoch: 16 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:18,299-Speed 11057.51 samples/sec Loss 6.7554 LearningRate 0.0352 Epoch: 16 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:19,214-Speed 11200.25 samples/sec Loss 6.7988 LearningRate 0.0352 Epoch: 16 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:20,146-Speed 10990.52 samples/sec Loss 6.5606 LearningRate 0.0352 Epoch: 16 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:21,033-Speed 11557.63 samples/sec Loss 6.6324 LearningRate 0.0352 Epoch: 16 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:21,918-Speed 11579.28 samples/sec Loss 6.7523 LearningRate 0.0352 Epoch: 16 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:22,807-Speed 11524.77 samples/sec Loss 6.7062 LearningRate 0.0352 Epoch: 16 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:23,714-Speed 11293.68 samples/sec Loss 6.6557 LearningRate 0.0352 Epoch: 16 Global Step: 92470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:24,623-Speed 11273.27 samples/sec Loss 6.7425 LearningRate 0.0352 Epoch: 16 Global Step: 92480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:25,554-Speed 11006.35 samples/sec Loss 6.8569 LearningRate 0.0352 Epoch: 16 Global Step: 92490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:26,456-Speed 11367.91 samples/sec Loss 6.7649 LearningRate 0.0352 Epoch: 16 Global Step: 92500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:27,367-Speed 11247.14 samples/sec Loss 6.6205 LearningRate 0.0352 Epoch: 16 Global Step: 92510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:28,251-Speed 11585.29 samples/sec Loss 6.6406 LearningRate 0.0352 Epoch: 16 Global Step: 92520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:29,161-Speed 11270.79 samples/sec Loss 6.5266 LearningRate 0.0352 Epoch: 16 Global Step: 92530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:30,068-Speed 11292.19 samples/sec Loss 6.8199 LearningRate 0.0352 Epoch: 16 Global Step: 92540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:30,994-Speed 11074.15 samples/sec Loss 6.6829 LearningRate 0.0352 Epoch: 16 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:31,866-Speed 11744.60 samples/sec Loss 6.6437 LearningRate 0.0352 Epoch: 16 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:32,756-Speed 11508.45 samples/sec Loss 6.7984 LearningRate 0.0352 Epoch: 16 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:33,649-Speed 11479.45 samples/sec Loss 6.7004 LearningRate 0.0352 Epoch: 16 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:34,554-Speed 11319.69 samples/sec Loss 6.6713 LearningRate 0.0352 Epoch: 16 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:35,468-Speed 11211.62 samples/sec Loss 6.5854 LearningRate 0.0351 Epoch: 16 Global Step: 92600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:36,383-Speed 11200.94 samples/sec Loss 6.7223 LearningRate 0.0351 Epoch: 16 Global Step: 92610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:37,257-Speed 11729.58 samples/sec Loss 6.6647 LearningRate 0.0351 Epoch: 16 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:38,184-Speed 11052.97 samples/sec Loss 6.7667 LearningRate 0.0351 Epoch: 16 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:39,088-Speed 11330.73 samples/sec Loss 6.5570 LearningRate 0.0351 Epoch: 16 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:40,040-Speed 10773.33 samples/sec Loss 6.6949 LearningRate 0.0351 Epoch: 16 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:40,961-Speed 11126.44 samples/sec Loss 6.8440 LearningRate 0.0351 Epoch: 16 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:41,893-Speed 10996.03 samples/sec Loss 6.7156 LearningRate 0.0351 Epoch: 16 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:42,808-Speed 11192.32 samples/sec Loss 6.6069 LearningRate 0.0351 Epoch: 16 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:43,716-Speed 11289.23 samples/sec Loss 6.7529 LearningRate 0.0351 Epoch: 16 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:44,625-Speed 11277.51 samples/sec Loss 6.7291 LearningRate 0.0351 Epoch: 16 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:45,523-Speed 11403.25 samples/sec Loss 6.7167 LearningRate 0.0351 Epoch: 16 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:46,410-Speed 11565.25 samples/sec Loss 6.7641 LearningRate 0.0351 Epoch: 16 Global Step: 92720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:47,291-Speed 11630.10 samples/sec Loss 6.7060 LearningRate 0.0351 Epoch: 16 Global Step: 92730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:48,236-Speed 10834.46 samples/sec Loss 6.7590 LearningRate 0.0351 Epoch: 16 Global Step: 92740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:49,164-Speed 11041.06 samples/sec Loss 6.5698 LearningRate 0.0351 Epoch: 16 Global Step: 92750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:32:50,053-Speed 11528.30 samples/sec Loss 6.7263 LearningRate 0.0351 Epoch: 16 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:50,944-Speed 11502.42 samples/sec Loss 6.6556 LearningRate 0.0351 Epoch: 16 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:51,821-Speed 11687.93 samples/sec Loss 6.6887 LearningRate 0.0351 Epoch: 16 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:52,721-Speed 11385.99 samples/sec Loss 6.7766 LearningRate 0.0350 Epoch: 16 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:53,608-Speed 11557.60 samples/sec Loss 6.7274 LearningRate 0.0350 Epoch: 16 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:54,485-Speed 11678.68 samples/sec Loss 6.7324 LearningRate 0.0350 Epoch: 16 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:32:55,391-Speed 11314.34 samples/sec Loss 6.8982 LearningRate 0.0350 Epoch: 16 Global Step: 92820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:32:56,268-Speed 11684.89 samples/sec Loss 6.7540 LearningRate 0.0350 Epoch: 16 Global Step: 92830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:32:57,142-Speed 11719.38 samples/sec Loss 6.7935 LearningRate 0.0350 Epoch: 16 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:32:58,024-Speed 11615.29 samples/sec Loss 6.6657 LearningRate 0.0350 Epoch: 16 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:32:58,911-Speed 11561.56 samples/sec Loss 6.7406 LearningRate 0.0350 Epoch: 16 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:32:59,805-Speed 11458.57 samples/sec Loss 6.6578 LearningRate 0.0350 Epoch: 16 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:33:00,701-Speed 11434.32 samples/sec Loss 6.6485 LearningRate 0.0350 Epoch: 16 Global Step: 92880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:33:01,644-Speed 10867.22 samples/sec Loss 6.6587 LearningRate 0.0350 Epoch: 16 Global Step: 92890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:33:02,576-Speed 10999.64 samples/sec Loss 6.6341 LearningRate 0.0350 Epoch: 16 Global Step: 92900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:33:03,489-Speed 11216.90 samples/sec Loss 6.7930 LearningRate 0.0350 Epoch: 16 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:33:04,368-Speed 11661.85 samples/sec Loss 6.6853 LearningRate 0.0350 Epoch: 16 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:05,266-Speed 11414.64 samples/sec Loss 6.7533 LearningRate 0.0350 Epoch: 16 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:06,168-Speed 11354.75 samples/sec Loss 6.7674 LearningRate 0.0350 Epoch: 16 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:07,116-Speed 10815.25 samples/sec Loss 6.8473 LearningRate 0.0350 Epoch: 16 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:08,016-Speed 11385.29 samples/sec Loss 6.7001 LearningRate 0.0350 Epoch: 16 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:08,898-Speed 11615.99 samples/sec Loss 6.8601 LearningRate 0.0350 Epoch: 16 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:09,795-Speed 11424.58 samples/sec Loss 6.8176 LearningRate 0.0350 Epoch: 16 Global Step: 92980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:10,698-Speed 11350.57 samples/sec Loss 6.7473 LearningRate 0.0349 Epoch: 16 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:11,580-Speed 11611.84 samples/sec Loss 6.7512 LearningRate 0.0349 Epoch: 16 Global Step: 93000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:12,465-Speed 11580.28 samples/sec Loss 6.7657 LearningRate 0.0349 Epoch: 16 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:13,343-Speed 11671.26 samples/sec Loss 6.7010 LearningRate 0.0349 Epoch: 16 Global Step: 93020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:14,221-Speed 11679.07 samples/sec Loss 6.6839 LearningRate 0.0349 Epoch: 16 Global Step: 93030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:15,116-Speed 11443.91 samples/sec Loss 6.6807 LearningRate 0.0349 Epoch: 16 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:16,018-Speed 11356.95 samples/sec Loss 6.7289 LearningRate 0.0349 Epoch: 16 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:16,895-Speed 11685.31 samples/sec Loss 6.8724 LearningRate 0.0349 Epoch: 16 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:17,787-Speed 11484.49 samples/sec Loss 6.7891 LearningRate 0.0349 Epoch: 16 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:18,725-Speed 10922.61 samples/sec Loss 6.7771 LearningRate 0.0349 Epoch: 16 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:19,679-Speed 10741.76 samples/sec Loss 6.7192 LearningRate 0.0349 Epoch: 16 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:20,550-Speed 11773.15 samples/sec Loss 6.7971 LearningRate 0.0349 Epoch: 16 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:21,436-Speed 11561.46 samples/sec Loss 6.6507 LearningRate 0.0349 Epoch: 16 Global Step: 93110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:22,322-Speed 11564.69 samples/sec Loss 6.6228 LearningRate 0.0349 Epoch: 16 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:23,197-Speed 11710.65 samples/sec Loss 6.7585 LearningRate 0.0349 Epoch: 16 Global Step: 93130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:24,081-Speed 11592.68 samples/sec Loss 6.8420 LearningRate 0.0349 Epoch: 16 Global Step: 93140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:24,941-Speed 11923.53 samples/sec Loss 6.7706 LearningRate 0.0349 Epoch: 16 Global Step: 93150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:25,852-Speed 11239.52 samples/sec Loss 6.5724 LearningRate 0.0349 Epoch: 16 Global Step: 93160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:26,767-Speed 11194.99 samples/sec Loss 6.8005 LearningRate 0.0349 Epoch: 16 Global Step: 93170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:27,647-Speed 11648.45 samples/sec Loss 6.7035 LearningRate 0.0348 Epoch: 16 Global Step: 93180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:28,569-Speed 11118.86 samples/sec Loss 6.7221 LearningRate 0.0348 Epoch: 16 Global Step: 93190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:29,488-Speed 11152.20 samples/sec Loss 6.7415 LearningRate 0.0348 Epoch: 16 Global Step: 93200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:30,385-Speed 11420.68 samples/sec Loss 6.7490 LearningRate 0.0348 Epoch: 16 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:31,300-Speed 11198.82 samples/sec Loss 6.9547 LearningRate 0.0348 Epoch: 16 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:32,185-Speed 11583.82 samples/sec Loss 6.7262 LearningRate 0.0348 Epoch: 16 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:33,064-Speed 11648.82 samples/sec Loss 6.6003 LearningRate 0.0348 Epoch: 16 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:33,937-Speed 11739.31 samples/sec Loss 6.8225 LearningRate 0.0348 Epoch: 16 Global Step: 93250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:34,819-Speed 11626.65 samples/sec Loss 6.7493 LearningRate 0.0348 Epoch: 16 Global Step: 93260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:35,719-Speed 11377.58 samples/sec Loss 6.7724 LearningRate 0.0348 Epoch: 16 Global Step: 93270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:36,616-Speed 11423.76 samples/sec Loss 6.7445 LearningRate 0.0348 Epoch: 16 Global Step: 93280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:37,526-Speed 11264.52 samples/sec Loss 6.7824 LearningRate 0.0348 Epoch: 16 Global Step: 93290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:38,420-Speed 11462.82 samples/sec Loss 6.7444 LearningRate 0.0348 Epoch: 16 Global Step: 93300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:39,322-Speed 11355.28 samples/sec Loss 6.8622 LearningRate 0.0348 Epoch: 16 Global Step: 93310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:40,223-Speed 11374.27 samples/sec Loss 6.6827 LearningRate 0.0348 Epoch: 16 Global Step: 93320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:41,118-Speed 11452.74 samples/sec Loss 6.7279 LearningRate 0.0348 Epoch: 16 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:41,995-Speed 11689.25 samples/sec Loss 6.7573 LearningRate 0.0348 Epoch: 16 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:42,882-Speed 11545.04 samples/sec Loss 6.6471 LearningRate 0.0348 Epoch: 16 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:43,796-Speed 11210.27 samples/sec Loss 6.7311 LearningRate 0.0348 Epoch: 16 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:44,698-Speed 11361.12 samples/sec Loss 6.6952 LearningRate 0.0347 Epoch: 16 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:45,573-Speed 11708.78 samples/sec Loss 6.8674 LearningRate 0.0347 Epoch: 16 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:46,450-Speed 11687.79 samples/sec Loss 6.7066 LearningRate 0.0347 Epoch: 16 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:47,336-Speed 11568.35 samples/sec Loss 6.6945 LearningRate 0.0347 Epoch: 16 Global Step: 93400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:48,255-Speed 11146.29 samples/sec Loss 6.6374 LearningRate 0.0347 Epoch: 16 Global Step: 93410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:49,190-Speed 10960.50 samples/sec Loss 6.7803 LearningRate 0.0347 Epoch: 16 Global Step: 93420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:33:50,121-Speed 11013.58 samples/sec Loss 6.7711 LearningRate 0.0347 Epoch: 16 Global Step: 93430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:51,044-Speed 11101.84 samples/sec Loss 6.7146 LearningRate 0.0347 Epoch: 16 Global Step: 93440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:51,965-Speed 11130.93 samples/sec Loss 6.8669 LearningRate 0.0347 Epoch: 16 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:52,872-Speed 11293.50 samples/sec Loss 6.7270 LearningRate 0.0347 Epoch: 16 Global Step: 93460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:53,819-Speed 10822.43 samples/sec Loss 6.8294 LearningRate 0.0347 Epoch: 16 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:54,722-Speed 11354.80 samples/sec Loss 6.8116 LearningRate 0.0347 Epoch: 16 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:55,617-Speed 11436.54 samples/sec Loss 6.7190 LearningRate 0.0347 Epoch: 16 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:56,526-Speed 11272.62 samples/sec Loss 6.7023 LearningRate 0.0347 Epoch: 16 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:57,412-Speed 11573.55 samples/sec Loss 6.7257 LearningRate 0.0347 Epoch: 16 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:58,312-Speed 11382.68 samples/sec Loss 6.8865 LearningRate 0.0347 Epoch: 16 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:33:59,198-Speed 11568.99 samples/sec Loss 6.9398 LearningRate 0.0347 Epoch: 16 Global Step: 93530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:00,077-Speed 11651.41 samples/sec Loss 6.6441 LearningRate 0.0347 Epoch: 16 Global Step: 93540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:01,000-Speed 11098.15 samples/sec Loss 6.7083 LearningRate 0.0347 Epoch: 16 Global Step: 93550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:01,964-Speed 10637.51 samples/sec Loss 6.8692 LearningRate 0.0346 Epoch: 16 Global Step: 93560 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:02,883-Speed 11143.92 samples/sec Loss 6.8805 LearningRate 0.0346 Epoch: 16 Global Step: 93570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:03,792-Speed 11282.08 samples/sec Loss 6.6997 LearningRate 0.0346 Epoch: 16 Global Step: 93580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:04,685-Speed 11474.21 samples/sec Loss 6.6524 LearningRate 0.0346 Epoch: 16 Global Step: 93590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:05,595-Speed 11259.01 samples/sec Loss 6.6935 LearningRate 0.0346 Epoch: 16 Global Step: 93600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:06,516-Speed 11122.44 samples/sec Loss 6.8095 LearningRate 0.0346 Epoch: 16 Global Step: 93610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:07,431-Speed 11199.30 samples/sec Loss 6.7143 LearningRate 0.0346 Epoch: 16 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:08,394-Speed 10634.67 samples/sec Loss 6.8564 LearningRate 0.0346 Epoch: 16 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:09,284-Speed 11513.02 samples/sec Loss 6.7483 LearningRate 0.0346 Epoch: 16 Global Step: 93640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:10,177-Speed 11481.92 samples/sec Loss 6.7120 LearningRate 0.0346 Epoch: 16 Global Step: 93650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:11,057-Speed 11641.18 samples/sec Loss 6.7251 LearningRate 0.0346 Epoch: 16 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:11,983-Speed 11069.00 samples/sec Loss 6.6829 LearningRate 0.0346 Epoch: 16 Global Step: 93670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:12,864-Speed 11636.11 samples/sec Loss 6.8261 LearningRate 0.0346 Epoch: 16 Global Step: 93680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:13,816-Speed 10758.57 samples/sec Loss 6.9008 LearningRate 0.0346 Epoch: 16 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:14,779-Speed 10640.43 samples/sec Loss 6.7379 LearningRate 0.0346 Epoch: 16 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:15,666-Speed 11557.04 samples/sec Loss 6.7826 LearningRate 0.0346 Epoch: 16 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:16,536-Speed 11781.27 samples/sec Loss 6.6639 LearningRate 0.0346 Epoch: 16 Global Step: 93720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:17,412-Speed 11700.69 samples/sec Loss 6.7045 LearningRate 0.0346 Epoch: 16 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:18,351-Speed 10921.68 samples/sec Loss 6.7894 LearningRate 0.0346 Epoch: 16 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:19,241-Speed 11503.30 samples/sec Loss 6.7702 LearningRate 0.0346 Epoch: 16 Global Step: 93750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:20,115-Speed 11726.68 samples/sec Loss 6.7620 LearningRate 0.0345 Epoch: 16 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:20,999-Speed 11589.10 samples/sec Loss 6.6835 LearningRate 0.0345 Epoch: 16 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:21,869-Speed 11778.69 samples/sec Loss 6.7407 LearningRate 0.0345 Epoch: 16 Global Step: 93780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:22,759-Speed 11515.89 samples/sec Loss 6.6740 LearningRate 0.0345 Epoch: 16 Global Step: 93790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:23,645-Speed 11576.83 samples/sec Loss 6.7178 LearningRate 0.0345 Epoch: 16 Global Step: 93800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:24,554-Speed 11260.20 samples/sec Loss 6.8355 LearningRate 0.0345 Epoch: 16 Global Step: 93810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:25,459-Speed 11331.48 samples/sec Loss 6.8891 LearningRate 0.0345 Epoch: 16 Global Step: 93820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:26,329-Speed 11767.73 samples/sec Loss 6.8996 LearningRate 0.0345 Epoch: 16 Global Step: 93830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:27,218-Speed 11535.37 samples/sec Loss 6.7858 LearningRate 0.0345 Epoch: 16 Global Step: 93840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:28,116-Speed 11413.92 samples/sec Loss 6.8855 LearningRate 0.0345 Epoch: 16 Global Step: 93850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:29,019-Speed 11346.70 samples/sec Loss 6.6650 LearningRate 0.0345 Epoch: 16 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:29,903-Speed 11586.13 samples/sec Loss 6.7523 LearningRate 0.0345 Epoch: 16 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 02:34:30,797-Speed 11465.09 samples/sec Loss 6.8314 LearningRate 0.0345 Epoch: 16 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:31,721-Speed 11080.96 samples/sec Loss 6.8796 LearningRate 0.0345 Epoch: 16 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:32,614-Speed 11479.89 samples/sec Loss 6.8320 LearningRate 0.0345 Epoch: 16 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:33,497-Speed 11601.42 samples/sec Loss 6.8392 LearningRate 0.0345 Epoch: 16 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:34,370-Speed 11744.75 samples/sec Loss 6.7844 LearningRate 0.0345 Epoch: 16 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:35,285-Speed 11203.26 samples/sec Loss 6.7417 LearningRate 0.0345 Epoch: 16 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:36,189-Speed 11332.82 samples/sec Loss 6.7403 LearningRate 0.0345 Epoch: 16 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:37,106-Speed 11172.27 samples/sec Loss 6.7011 LearningRate 0.0344 Epoch: 16 Global Step: 93950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:37,992-Speed 11565.06 samples/sec Loss 6.8913 LearningRate 0.0344 Epoch: 16 Global Step: 93960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:38,946-Speed 10745.93 samples/sec Loss 6.8327 LearningRate 0.0344 Epoch: 16 Global Step: 93970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:39,917-Speed 10561.89 samples/sec Loss 6.8192 LearningRate 0.0344 Epoch: 16 Global Step: 93980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:34:40,855-Speed 10923.31 samples/sec Loss 6.8907 LearningRate 0.0344 Epoch: 16 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:34:41,728-Speed 11735.58 samples/sec Loss 6.8622 LearningRate 0.0344 Epoch: 16 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:03,089-[lfw][94000]XNorm: 11.353341 Training: 2022-04-11 02:35:03,090-[lfw][94000]Accuracy-Flip: 0.99600+-0.00327 Training: 2022-04-11 02:35:03,090-[lfw][94000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:35:27,748-[cfp_fp][94000]XNorm: 9.539596 Training: 2022-04-11 02:35:27,749-[cfp_fp][94000]Accuracy-Flip: 0.93986+-0.01003 Training: 2022-04-11 02:35:27,749-[cfp_fp][94000]Accuracy-Highest: 0.94386 Training: 2022-04-11 02:35:48,930-[agedb_30][94000]XNorm: 11.171709 Training: 2022-04-11 02:35:48,930-[agedb_30][94000]Accuracy-Flip: 0.96467+-0.00918 Training: 2022-04-11 02:35:48,931-[agedb_30][94000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:35:49,847-Speed 150.33 samples/sec Loss 6.7510 LearningRate 0.0344 Epoch: 16 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:50,752-Speed 11327.67 samples/sec Loss 6.8755 LearningRate 0.0344 Epoch: 16 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:51,761-Speed 10155.98 samples/sec Loss 6.6728 LearningRate 0.0344 Epoch: 16 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:52,635-Speed 11722.02 samples/sec Loss 6.6061 LearningRate 0.0344 Epoch: 16 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:53,529-Speed 11459.43 samples/sec Loss 6.7510 LearningRate 0.0344 Epoch: 16 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:54,453-Speed 11086.16 samples/sec Loss 6.8575 LearningRate 0.0344 Epoch: 16 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:55,360-Speed 11300.01 samples/sec Loss 6.5712 LearningRate 0.0344 Epoch: 16 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:56,256-Speed 11435.35 samples/sec Loss 6.7159 LearningRate 0.0344 Epoch: 16 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:57,134-Speed 11680.98 samples/sec Loss 6.7503 LearningRate 0.0344 Epoch: 16 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:35:58,018-Speed 11580.89 samples/sec Loss 6.7377 LearningRate 0.0344 Epoch: 16 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:35:58,898-Speed 11653.49 samples/sec Loss 6.6196 LearningRate 0.0344 Epoch: 16 Global Step: 94110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:35:59,800-Speed 11352.66 samples/sec Loss 6.7214 LearningRate 0.0344 Epoch: 16 Global Step: 94120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:00,675-Speed 11710.79 samples/sec Loss 6.7963 LearningRate 0.0344 Epoch: 16 Global Step: 94130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:01,575-Speed 11389.92 samples/sec Loss 6.7398 LearningRate 0.0343 Epoch: 16 Global Step: 94140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:02,496-Speed 11126.34 samples/sec Loss 6.8035 LearningRate 0.0343 Epoch: 16 Global Step: 94150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:03,408-Speed 11234.74 samples/sec Loss 6.5999 LearningRate 0.0343 Epoch: 16 Global Step: 94160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:04,301-Speed 11483.98 samples/sec Loss 6.7171 LearningRate 0.0343 Epoch: 16 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:05,246-Speed 10841.93 samples/sec Loss 6.7073 LearningRate 0.0343 Epoch: 16 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:06,178-Speed 10989.23 samples/sec Loss 6.8680 LearningRate 0.0343 Epoch: 16 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:07,076-Speed 11418.64 samples/sec Loss 6.7299 LearningRate 0.0343 Epoch: 16 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:08,014-Speed 10921.28 samples/sec Loss 6.6669 LearningRate 0.0343 Epoch: 16 Global Step: 94210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:08,910-Speed 11443.19 samples/sec Loss 6.9404 LearningRate 0.0343 Epoch: 16 Global Step: 94220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:09,793-Speed 11601.98 samples/sec Loss 6.8235 LearningRate 0.0343 Epoch: 16 Global Step: 94230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:10,667-Speed 11723.97 samples/sec Loss 6.6744 LearningRate 0.0343 Epoch: 16 Global Step: 94240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:11,546-Speed 11655.12 samples/sec Loss 6.8313 LearningRate 0.0343 Epoch: 16 Global Step: 94250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:12,453-Speed 11295.46 samples/sec Loss 6.6724 LearningRate 0.0343 Epoch: 16 Global Step: 94260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:13,401-Speed 10809.02 samples/sec Loss 6.7812 LearningRate 0.0343 Epoch: 16 Global Step: 94270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:14,331-Speed 11018.84 samples/sec Loss 6.7305 LearningRate 0.0343 Epoch: 16 Global Step: 94280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:15,278-Speed 10828.79 samples/sec Loss 6.7809 LearningRate 0.0343 Epoch: 16 Global Step: 94290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:16,190-Speed 11229.61 samples/sec Loss 6.6091 LearningRate 0.0343 Epoch: 16 Global Step: 94300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:17,104-Speed 11209.19 samples/sec Loss 6.8938 LearningRate 0.0343 Epoch: 16 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:18,030-Speed 11071.23 samples/sec Loss 6.8742 LearningRate 0.0343 Epoch: 16 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:18,947-Speed 11177.56 samples/sec Loss 6.6344 LearningRate 0.0343 Epoch: 16 Global Step: 94330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:19,843-Speed 11431.06 samples/sec Loss 6.7477 LearningRate 0.0342 Epoch: 16 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:20,720-Speed 11691.44 samples/sec Loss 6.7222 LearningRate 0.0342 Epoch: 16 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:21,606-Speed 11557.59 samples/sec Loss 6.7843 LearningRate 0.0342 Epoch: 16 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:22,490-Speed 11604.21 samples/sec Loss 6.8390 LearningRate 0.0342 Epoch: 16 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:23,393-Speed 11346.10 samples/sec Loss 6.8314 LearningRate 0.0342 Epoch: 16 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:24,295-Speed 11361.30 samples/sec Loss 6.7112 LearningRate 0.0342 Epoch: 16 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:25,219-Speed 11085.04 samples/sec Loss 6.8937 LearningRate 0.0342 Epoch: 16 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:26,145-Speed 11067.22 samples/sec Loss 6.8574 LearningRate 0.0342 Epoch: 16 Global Step: 94410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:27,055-Speed 11259.32 samples/sec Loss 6.6741 LearningRate 0.0342 Epoch: 16 Global Step: 94420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:27,949-Speed 11464.24 samples/sec Loss 6.6805 LearningRate 0.0342 Epoch: 16 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:28,837-Speed 11533.71 samples/sec Loss 6.9323 LearningRate 0.0342 Epoch: 16 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:29,737-Speed 11383.07 samples/sec Loss 6.8618 LearningRate 0.0342 Epoch: 16 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:30,624-Speed 11553.84 samples/sec Loss 6.8303 LearningRate 0.0342 Epoch: 16 Global Step: 94460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:31,540-Speed 11187.42 samples/sec Loss 6.8453 LearningRate 0.0342 Epoch: 16 Global Step: 94470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:32,425-Speed 11580.34 samples/sec Loss 6.9378 LearningRate 0.0342 Epoch: 16 Global Step: 94480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:33,302-Speed 11683.51 samples/sec Loss 6.9937 LearningRate 0.0342 Epoch: 16 Global Step: 94490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:34,188-Speed 11560.04 samples/sec Loss 6.6605 LearningRate 0.0342 Epoch: 16 Global Step: 94500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:35,070-Speed 11629.74 samples/sec Loss 6.7801 LearningRate 0.0342 Epoch: 16 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:35,968-Speed 11400.79 samples/sec Loss 6.6877 LearningRate 0.0342 Epoch: 16 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:36,917-Speed 10794.52 samples/sec Loss 6.7085 LearningRate 0.0341 Epoch: 16 Global Step: 94530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:37,876-Speed 10690.40 samples/sec Loss 6.7817 LearningRate 0.0341 Epoch: 16 Global Step: 94540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:38,804-Speed 11035.25 samples/sec Loss 6.7764 LearningRate 0.0341 Epoch: 16 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:39,744-Speed 10906.79 samples/sec Loss 7.0188 LearningRate 0.0341 Epoch: 16 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:40,726-Speed 10440.60 samples/sec Loss 6.6338 LearningRate 0.0341 Epoch: 16 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:41,685-Speed 10690.15 samples/sec Loss 6.8078 LearningRate 0.0341 Epoch: 16 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:42,630-Speed 10841.37 samples/sec Loss 6.8336 LearningRate 0.0341 Epoch: 16 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:43,552-Speed 11105.01 samples/sec Loss 6.8075 LearningRate 0.0341 Epoch: 16 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:44,435-Speed 11611.72 samples/sec Loss 6.8709 LearningRate 0.0341 Epoch: 16 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:45,333-Speed 11414.27 samples/sec Loss 6.6833 LearningRate 0.0341 Epoch: 16 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:46,214-Speed 11626.64 samples/sec Loss 6.7230 LearningRate 0.0341 Epoch: 16 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:47,078-Speed 11861.79 samples/sec Loss 6.6916 LearningRate 0.0341 Epoch: 16 Global Step: 94640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:47,951-Speed 11736.71 samples/sec Loss 6.7474 LearningRate 0.0341 Epoch: 16 Global Step: 94650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:48,844-Speed 11477.54 samples/sec Loss 6.8034 LearningRate 0.0341 Epoch: 16 Global Step: 94660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:49,776-Speed 10990.79 samples/sec Loss 6.7200 LearningRate 0.0341 Epoch: 16 Global Step: 94670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:50,657-Speed 11625.75 samples/sec Loss 6.8619 LearningRate 0.0341 Epoch: 16 Global Step: 94680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:51,541-Speed 11597.68 samples/sec Loss 6.8133 LearningRate 0.0341 Epoch: 16 Global Step: 94690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:52,441-Speed 11382.16 samples/sec Loss 6.7794 LearningRate 0.0341 Epoch: 16 Global Step: 94700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:53,367-Speed 11072.54 samples/sec Loss 6.7269 LearningRate 0.0341 Epoch: 16 Global Step: 94710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:54,331-Speed 10631.71 samples/sec Loss 6.7670 LearningRate 0.0341 Epoch: 16 Global Step: 94720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:55,271-Speed 10893.98 samples/sec Loss 6.7289 LearningRate 0.0340 Epoch: 16 Global Step: 94730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:56,227-Speed 10725.98 samples/sec Loss 6.8158 LearningRate 0.0340 Epoch: 16 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:36:57,111-Speed 11584.63 samples/sec Loss 6.6628 LearningRate 0.0340 Epoch: 16 Global Step: 94750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:58,002-Speed 11500.11 samples/sec Loss 6.8098 LearningRate 0.0340 Epoch: 16 Global Step: 94760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:58,919-Speed 11181.03 samples/sec Loss 6.7932 LearningRate 0.0340 Epoch: 16 Global Step: 94770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:36:59,845-Speed 11069.94 samples/sec Loss 6.8399 LearningRate 0.0340 Epoch: 16 Global Step: 94780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:00,740-Speed 11447.28 samples/sec Loss 6.7833 LearningRate 0.0340 Epoch: 16 Global Step: 94790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:01,670-Speed 11019.21 samples/sec Loss 6.8615 LearningRate 0.0340 Epoch: 16 Global Step: 94800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:02,629-Speed 10684.12 samples/sec Loss 6.8093 LearningRate 0.0340 Epoch: 16 Global Step: 94810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:03,527-Speed 11405.02 samples/sec Loss 6.7123 LearningRate 0.0340 Epoch: 16 Global Step: 94820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:04,429-Speed 11374.12 samples/sec Loss 6.6795 LearningRate 0.0340 Epoch: 16 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:05,375-Speed 10822.09 samples/sec Loss 6.7846 LearningRate 0.0340 Epoch: 16 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:06,336-Speed 10663.30 samples/sec Loss 6.7298 LearningRate 0.0340 Epoch: 16 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:07,301-Speed 10616.41 samples/sec Loss 6.8100 LearningRate 0.0340 Epoch: 16 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:08,204-Speed 11356.34 samples/sec Loss 6.7310 LearningRate 0.0340 Epoch: 16 Global Step: 94870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:09,090-Speed 11573.43 samples/sec Loss 6.8947 LearningRate 0.0340 Epoch: 16 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:09,968-Speed 11673.23 samples/sec Loss 6.9472 LearningRate 0.0340 Epoch: 16 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:10,854-Speed 11560.13 samples/sec Loss 6.7834 LearningRate 0.0340 Epoch: 16 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:11,742-Speed 11545.43 samples/sec Loss 6.7194 LearningRate 0.0340 Epoch: 16 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:12,663-Speed 11118.65 samples/sec Loss 6.8077 LearningRate 0.0339 Epoch: 16 Global Step: 94920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:13,579-Speed 11188.89 samples/sec Loss 6.7171 LearningRate 0.0339 Epoch: 16 Global Step: 94930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:14,459-Speed 11650.34 samples/sec Loss 6.7543 LearningRate 0.0339 Epoch: 16 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:15,365-Speed 11301.77 samples/sec Loss 6.8091 LearningRate 0.0339 Epoch: 16 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:16,268-Speed 11358.54 samples/sec Loss 6.7979 LearningRate 0.0339 Epoch: 16 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:17,160-Speed 11484.43 samples/sec Loss 6.7057 LearningRate 0.0339 Epoch: 16 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:18,028-Speed 11800.41 samples/sec Loss 6.7890 LearningRate 0.0339 Epoch: 16 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:18,932-Speed 11331.96 samples/sec Loss 6.7097 LearningRate 0.0339 Epoch: 16 Global Step: 94990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:19,822-Speed 11522.53 samples/sec Loss 6.7465 LearningRate 0.0339 Epoch: 16 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:20,696-Speed 11714.27 samples/sec Loss 6.7224 LearningRate 0.0339 Epoch: 16 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:21,574-Speed 11675.59 samples/sec Loss 6.6316 LearningRate 0.0339 Epoch: 16 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:22,459-Speed 11581.10 samples/sec Loss 6.7755 LearningRate 0.0339 Epoch: 16 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:23,366-Speed 11304.47 samples/sec Loss 6.5668 LearningRate 0.0339 Epoch: 16 Global Step: 95040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:24,261-Speed 11449.66 samples/sec Loss 6.7373 LearningRate 0.0339 Epoch: 16 Global Step: 95050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:25,156-Speed 11448.81 samples/sec Loss 6.6231 LearningRate 0.0339 Epoch: 16 Global Step: 95060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:26,080-Speed 11087.56 samples/sec Loss 6.7709 LearningRate 0.0339 Epoch: 16 Global Step: 95070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:26,993-Speed 11219.42 samples/sec Loss 6.7493 LearningRate 0.0339 Epoch: 16 Global Step: 95080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:27,945-Speed 10764.35 samples/sec Loss 6.8019 LearningRate 0.0339 Epoch: 16 Global Step: 95090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:28,869-Speed 11093.49 samples/sec Loss 6.6659 LearningRate 0.0339 Epoch: 16 Global Step: 95100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:29,808-Speed 10907.26 samples/sec Loss 6.8029 LearningRate 0.0339 Epoch: 16 Global Step: 95110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:30,742-Speed 10977.41 samples/sec Loss 6.5646 LearningRate 0.0338 Epoch: 16 Global Step: 95120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:31,681-Speed 10903.48 samples/sec Loss 6.7223 LearningRate 0.0338 Epoch: 16 Global Step: 95130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:32,590-Speed 11277.39 samples/sec Loss 6.7669 LearningRate 0.0338 Epoch: 16 Global Step: 95140 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:37:33,487-Speed 11423.04 samples/sec Loss 6.8834 LearningRate 0.0338 Epoch: 16 Global Step: 95150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:34,362-Speed 11710.82 samples/sec Loss 6.6932 LearningRate 0.0338 Epoch: 16 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:35,248-Speed 11570.68 samples/sec Loss 6.5591 LearningRate 0.0338 Epoch: 16 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:36,133-Speed 11584.12 samples/sec Loss 6.8550 LearningRate 0.0338 Epoch: 16 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:37,044-Speed 11249.52 samples/sec Loss 6.8407 LearningRate 0.0338 Epoch: 16 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:37,943-Speed 11397.44 samples/sec Loss 6.8445 LearningRate 0.0338 Epoch: 16 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:38,841-Speed 11406.70 samples/sec Loss 6.8190 LearningRate 0.0338 Epoch: 16 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:39,789-Speed 10816.16 samples/sec Loss 6.8796 LearningRate 0.0338 Epoch: 16 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:40,679-Speed 11508.22 samples/sec Loss 6.8139 LearningRate 0.0338 Epoch: 16 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:41,589-Speed 11261.00 samples/sec Loss 6.7007 LearningRate 0.0338 Epoch: 16 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:42,462-Speed 11736.54 samples/sec Loss 6.8885 LearningRate 0.0338 Epoch: 16 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:43,351-Speed 11528.20 samples/sec Loss 6.8736 LearningRate 0.0338 Epoch: 16 Global Step: 95260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:44,228-Speed 11682.50 samples/sec Loss 6.8169 LearningRate 0.0338 Epoch: 16 Global Step: 95270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:45,112-Speed 11590.49 samples/sec Loss 6.5883 LearningRate 0.0338 Epoch: 16 Global Step: 95280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:45,991-Speed 11664.41 samples/sec Loss 6.8424 LearningRate 0.0338 Epoch: 16 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:46,887-Speed 11431.28 samples/sec Loss 6.6640 LearningRate 0.0338 Epoch: 16 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:47,787-Speed 11384.96 samples/sec Loss 6.7843 LearningRate 0.0337 Epoch: 16 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:48,702-Speed 11201.18 samples/sec Loss 6.7660 LearningRate 0.0337 Epoch: 16 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:49,596-Speed 11455.60 samples/sec Loss 6.6200 LearningRate 0.0337 Epoch: 16 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:50,515-Speed 11156.94 samples/sec Loss 6.8008 LearningRate 0.0337 Epoch: 16 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:51,413-Speed 11406.72 samples/sec Loss 6.8968 LearningRate 0.0337 Epoch: 16 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:52,351-Speed 10929.12 samples/sec Loss 6.7601 LearningRate 0.0337 Epoch: 16 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:53,313-Speed 10653.45 samples/sec Loss 6.7725 LearningRate 0.0337 Epoch: 16 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:54,265-Speed 10758.91 samples/sec Loss 6.6692 LearningRate 0.0337 Epoch: 16 Global Step: 95380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:37:55,182-Speed 11175.96 samples/sec Loss 6.8313 LearningRate 0.0337 Epoch: 16 Global Step: 95390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:56,122-Speed 10899.19 samples/sec Loss 6.7115 LearningRate 0.0337 Epoch: 16 Global Step: 95400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:57,006-Speed 11592.62 samples/sec Loss 6.7789 LearningRate 0.0337 Epoch: 16 Global Step: 95410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:57,881-Speed 11717.38 samples/sec Loss 6.7854 LearningRate 0.0337 Epoch: 16 Global Step: 95420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:58,765-Speed 11587.88 samples/sec Loss 6.7286 LearningRate 0.0337 Epoch: 16 Global Step: 95430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:37:59,650-Speed 11581.90 samples/sec Loss 6.7630 LearningRate 0.0337 Epoch: 16 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:00,544-Speed 11465.75 samples/sec Loss 6.8059 LearningRate 0.0337 Epoch: 16 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:01,426-Speed 11608.16 samples/sec Loss 6.8760 LearningRate 0.0337 Epoch: 16 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:02,338-Speed 11244.38 samples/sec Loss 6.7771 LearningRate 0.0337 Epoch: 16 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:03,254-Speed 11177.88 samples/sec Loss 6.7246 LearningRate 0.0337 Epoch: 16 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:04,154-Speed 11396.65 samples/sec Loss 6.7963 LearningRate 0.0337 Epoch: 16 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:05,096-Speed 10869.43 samples/sec Loss 6.7942 LearningRate 0.0337 Epoch: 16 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:05,976-Speed 11650.74 samples/sec Loss 6.5903 LearningRate 0.0336 Epoch: 16 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:06,872-Speed 11428.20 samples/sec Loss 6.8357 LearningRate 0.0336 Epoch: 16 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:07,833-Speed 10668.42 samples/sec Loss 6.8868 LearningRate 0.0336 Epoch: 16 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:08,727-Speed 11457.37 samples/sec Loss 6.8138 LearningRate 0.0336 Epoch: 16 Global Step: 95540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:09,605-Speed 11669.94 samples/sec Loss 6.9354 LearningRate 0.0336 Epoch: 16 Global Step: 95550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:10,479-Speed 11737.86 samples/sec Loss 6.8099 LearningRate 0.0336 Epoch: 16 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:11,349-Speed 11774.20 samples/sec Loss 6.7620 LearningRate 0.0336 Epoch: 16 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:12,236-Speed 11549.17 samples/sec Loss 6.8651 LearningRate 0.0336 Epoch: 16 Global Step: 95580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:13,157-Speed 11124.86 samples/sec Loss 6.6325 LearningRate 0.0336 Epoch: 16 Global Step: 95590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:14,075-Speed 11159.60 samples/sec Loss 6.6514 LearningRate 0.0336 Epoch: 16 Global Step: 95600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:15,046-Speed 10562.79 samples/sec Loss 6.8076 LearningRate 0.0336 Epoch: 16 Global Step: 95610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:15,942-Speed 11424.83 samples/sec Loss 6.6620 LearningRate 0.0336 Epoch: 16 Global Step: 95620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:16,894-Speed 10771.81 samples/sec Loss 6.8138 LearningRate 0.0336 Epoch: 16 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:17,824-Speed 11013.46 samples/sec Loss 6.6178 LearningRate 0.0336 Epoch: 16 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:18,729-Speed 11319.98 samples/sec Loss 6.7319 LearningRate 0.0336 Epoch: 16 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:19,619-Speed 11519.82 samples/sec Loss 6.7443 LearningRate 0.0336 Epoch: 16 Global Step: 95660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:20,503-Speed 11581.91 samples/sec Loss 6.8626 LearningRate 0.0336 Epoch: 16 Global Step: 95670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:21,384-Speed 11639.37 samples/sec Loss 6.9418 LearningRate 0.0336 Epoch: 16 Global Step: 95680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:22,297-Speed 11218.45 samples/sec Loss 6.8459 LearningRate 0.0336 Epoch: 16 Global Step: 95690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:23,242-Speed 10845.13 samples/sec Loss 6.8077 LearningRate 0.0336 Epoch: 16 Global Step: 95700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:24,174-Speed 10993.04 samples/sec Loss 6.8748 LearningRate 0.0335 Epoch: 16 Global Step: 95710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:25,072-Speed 11422.30 samples/sec Loss 6.8979 LearningRate 0.0335 Epoch: 16 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:26,043-Speed 10552.53 samples/sec Loss 6.7917 LearningRate 0.0335 Epoch: 16 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:26,981-Speed 10919.71 samples/sec Loss 6.6879 LearningRate 0.0335 Epoch: 16 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:27,925-Speed 10856.34 samples/sec Loss 6.7685 LearningRate 0.0335 Epoch: 16 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:28,850-Speed 11071.50 samples/sec Loss 6.7565 LearningRate 0.0335 Epoch: 16 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:29,795-Speed 10853.06 samples/sec Loss 6.8635 LearningRate 0.0335 Epoch: 16 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:30,682-Speed 11540.87 samples/sec Loss 6.9109 LearningRate 0.0335 Epoch: 16 Global Step: 95780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:31,642-Speed 10676.67 samples/sec Loss 6.7977 LearningRate 0.0335 Epoch: 16 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:32,556-Speed 11218.81 samples/sec Loss 6.6982 LearningRate 0.0335 Epoch: 16 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:33,425-Speed 11785.08 samples/sec Loss 6.8575 LearningRate 0.0335 Epoch: 16 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:34,301-Speed 11700.07 samples/sec Loss 6.7429 LearningRate 0.0335 Epoch: 16 Global Step: 95820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:35,205-Speed 11337.94 samples/sec Loss 6.8996 LearningRate 0.0335 Epoch: 16 Global Step: 95830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:36,122-Speed 11173.58 samples/sec Loss 6.7402 LearningRate 0.0335 Epoch: 16 Global Step: 95840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:37,044-Speed 11116.45 samples/sec Loss 6.8697 LearningRate 0.0335 Epoch: 16 Global Step: 95850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:37,966-Speed 11111.44 samples/sec Loss 6.7632 LearningRate 0.0335 Epoch: 16 Global Step: 95860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:38,885-Speed 11146.57 samples/sec Loss 6.7173 LearningRate 0.0335 Epoch: 16 Global Step: 95870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:39,802-Speed 11174.97 samples/sec Loss 6.9248 LearningRate 0.0335 Epoch: 16 Global Step: 95880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:40,787-Speed 10402.11 samples/sec Loss 6.7753 LearningRate 0.0335 Epoch: 16 Global Step: 95890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:41,693-Speed 11308.45 samples/sec Loss 6.8239 LearningRate 0.0334 Epoch: 16 Global Step: 95900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:42,570-Speed 11693.80 samples/sec Loss 6.8210 LearningRate 0.0334 Epoch: 16 Global Step: 95910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:38:43,501-Speed 11002.92 samples/sec Loss 6.6108 LearningRate 0.0334 Epoch: 16 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:44,440-Speed 10916.60 samples/sec Loss 6.8482 LearningRate 0.0334 Epoch: 16 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:45,324-Speed 11589.63 samples/sec Loss 6.8337 LearningRate 0.0334 Epoch: 16 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:46,202-Speed 11664.68 samples/sec Loss 6.8085 LearningRate 0.0334 Epoch: 16 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:47,103-Speed 11379.99 samples/sec Loss 6.8421 LearningRate 0.0334 Epoch: 16 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:48,048-Speed 10834.70 samples/sec Loss 6.6781 LearningRate 0.0334 Epoch: 16 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:48,930-Speed 11621.21 samples/sec Loss 6.7563 LearningRate 0.0334 Epoch: 16 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:49,867-Speed 10938.77 samples/sec Loss 6.7169 LearningRate 0.0334 Epoch: 16 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:38:50,779-Speed 11227.52 samples/sec Loss 6.6753 LearningRate 0.0334 Epoch: 16 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:39:12,191-[lfw][96000]XNorm: 11.045544 Training: 2022-04-11 02:39:12,191-[lfw][96000]Accuracy-Flip: 0.99600+-0.00281 Training: 2022-04-11 02:39:12,192-[lfw][96000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:39:36,963-[cfp_fp][96000]XNorm: 9.288370 Training: 2022-04-11 02:39:36,963-[cfp_fp][96000]Accuracy-Flip: 0.94443+-0.01196 Training: 2022-04-11 02:39:36,964-[cfp_fp][96000]Accuracy-Highest: 0.94443 Training: 2022-04-11 02:39:58,197-[agedb_30][96000]XNorm: 10.833511 Training: 2022-04-11 02:39:58,198-[agedb_30][96000]Accuracy-Flip: 0.96083+-0.00952 Training: 2022-04-11 02:39:58,198-[agedb_30][96000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:39:59,111-Speed 149.86 samples/sec Loss 6.7860 LearningRate 0.0334 Epoch: 16 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:00,009-Speed 11409.64 samples/sec Loss 6.6608 LearningRate 0.0334 Epoch: 16 Global Step: 96020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:00,889-Speed 11638.14 samples/sec Loss 6.8094 LearningRate 0.0334 Epoch: 16 Global Step: 96030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:01,781-Speed 11492.01 samples/sec Loss 6.7816 LearningRate 0.0334 Epoch: 16 Global Step: 96040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:02,675-Speed 11461.56 samples/sec Loss 6.7412 LearningRate 0.0334 Epoch: 16 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:03,561-Speed 11574.01 samples/sec Loss 6.6274 LearningRate 0.0334 Epoch: 16 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:04,487-Speed 11061.16 samples/sec Loss 6.9121 LearningRate 0.0334 Epoch: 16 Global Step: 96070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:05,395-Speed 11283.76 samples/sec Loss 6.7579 LearningRate 0.0334 Epoch: 16 Global Step: 96080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:06,296-Speed 11369.27 samples/sec Loss 6.7803 LearningRate 0.0334 Epoch: 16 Global Step: 96090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:07,180-Speed 11588.52 samples/sec Loss 6.6644 LearningRate 0.0333 Epoch: 16 Global Step: 96100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:08,094-Speed 11212.55 samples/sec Loss 6.6232 LearningRate 0.0333 Epoch: 16 Global Step: 96110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:08,983-Speed 11529.23 samples/sec Loss 6.8387 LearningRate 0.0333 Epoch: 16 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:09,858-Speed 11705.75 samples/sec Loss 6.8705 LearningRate 0.0333 Epoch: 16 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:10,765-Speed 11304.90 samples/sec Loss 6.7425 LearningRate 0.0333 Epoch: 16 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:11,686-Speed 11131.50 samples/sec Loss 6.7056 LearningRate 0.0333 Epoch: 16 Global Step: 96150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:12,620-Speed 10960.56 samples/sec Loss 6.7564 LearningRate 0.0333 Epoch: 16 Global Step: 96160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:13,522-Speed 11359.26 samples/sec Loss 6.7923 LearningRate 0.0333 Epoch: 16 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:14,476-Speed 10747.05 samples/sec Loss 6.6639 LearningRate 0.0333 Epoch: 16 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:15,403-Speed 11053.21 samples/sec Loss 6.8665 LearningRate 0.0333 Epoch: 16 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:16,367-Speed 10630.58 samples/sec Loss 6.7542 LearningRate 0.0333 Epoch: 16 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:17,290-Speed 11097.34 samples/sec Loss 6.7097 LearningRate 0.0333 Epoch: 16 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:18,192-Speed 11371.88 samples/sec Loss 6.8147 LearningRate 0.0333 Epoch: 16 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:19,129-Speed 10930.78 samples/sec Loss 6.6311 LearningRate 0.0333 Epoch: 16 Global Step: 96230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:20,048-Speed 11147.50 samples/sec Loss 6.6659 LearningRate 0.0333 Epoch: 16 Global Step: 96240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:20,932-Speed 11600.26 samples/sec Loss 6.7883 LearningRate 0.0333 Epoch: 16 Global Step: 96250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:21,806-Speed 11724.63 samples/sec Loss 6.7080 LearningRate 0.0333 Epoch: 16 Global Step: 96260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:22,701-Speed 11441.52 samples/sec Loss 6.8143 LearningRate 0.0333 Epoch: 16 Global Step: 96270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:23,584-Speed 11601.10 samples/sec Loss 6.4578 LearningRate 0.0333 Epoch: 16 Global Step: 96280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:24,469-Speed 11580.43 samples/sec Loss 6.6225 LearningRate 0.0333 Epoch: 16 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:25,377-Speed 11286.64 samples/sec Loss 6.8224 LearningRate 0.0332 Epoch: 16 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:26,311-Speed 10972.28 samples/sec Loss 6.7177 LearningRate 0.0332 Epoch: 16 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:27,233-Speed 11117.12 samples/sec Loss 6.8680 LearningRate 0.0332 Epoch: 16 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:28,198-Speed 10615.50 samples/sec Loss 6.7880 LearningRate 0.0332 Epoch: 16 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:29,151-Speed 10749.00 samples/sec Loss 6.9626 LearningRate 0.0332 Epoch: 16 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:30,102-Speed 10785.20 samples/sec Loss 6.8577 LearningRate 0.0332 Epoch: 16 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:31,009-Speed 11299.50 samples/sec Loss 6.8016 LearningRate 0.0332 Epoch: 16 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:31,887-Speed 11664.62 samples/sec Loss 6.8735 LearningRate 0.0332 Epoch: 16 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:32,772-Speed 11574.99 samples/sec Loss 6.7578 LearningRate 0.0332 Epoch: 16 Global Step: 96380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:33,657-Speed 11580.23 samples/sec Loss 6.8371 LearningRate 0.0332 Epoch: 16 Global Step: 96390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:34,549-Speed 11496.53 samples/sec Loss 6.7475 LearningRate 0.0332 Epoch: 16 Global Step: 96400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:35,426-Speed 11682.78 samples/sec Loss 6.7657 LearningRate 0.0332 Epoch: 16 Global Step: 96410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:36,297-Speed 11760.36 samples/sec Loss 6.8285 LearningRate 0.0332 Epoch: 16 Global Step: 96420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:37,218-Speed 11129.53 samples/sec Loss 6.6518 LearningRate 0.0332 Epoch: 16 Global Step: 96430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:38,115-Speed 11420.27 samples/sec Loss 6.8053 LearningRate 0.0332 Epoch: 16 Global Step: 96440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:39,074-Speed 10679.43 samples/sec Loss 6.8011 LearningRate 0.0332 Epoch: 16 Global Step: 96450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:39,991-Speed 11179.81 samples/sec Loss 6.7815 LearningRate 0.0332 Epoch: 16 Global Step: 96460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:40,927-Speed 10950.50 samples/sec Loss 6.6841 LearningRate 0.0332 Epoch: 16 Global Step: 96470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:41,836-Speed 11272.56 samples/sec Loss 6.7687 LearningRate 0.0332 Epoch: 16 Global Step: 96480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:42,798-Speed 10658.32 samples/sec Loss 6.7001 LearningRate 0.0331 Epoch: 16 Global Step: 96490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:43,716-Speed 11151.57 samples/sec Loss 6.8327 LearningRate 0.0331 Epoch: 16 Global Step: 96500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:44,642-Speed 11067.05 samples/sec Loss 6.8314 LearningRate 0.0331 Epoch: 16 Global Step: 96510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:45,516-Speed 11728.54 samples/sec Loss 6.7241 LearningRate 0.0331 Epoch: 16 Global Step: 96520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:46,403-Speed 11547.69 samples/sec Loss 6.8121 LearningRate 0.0331 Epoch: 16 Global Step: 96530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:47,290-Speed 11554.96 samples/sec Loss 6.8045 LearningRate 0.0331 Epoch: 16 Global Step: 96540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:48,193-Speed 11351.43 samples/sec Loss 6.7164 LearningRate 0.0331 Epoch: 16 Global Step: 96550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:49,098-Speed 11316.92 samples/sec Loss 6.8426 LearningRate 0.0331 Epoch: 16 Global Step: 96560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:49,981-Speed 11613.28 samples/sec Loss 6.7349 LearningRate 0.0331 Epoch: 16 Global Step: 96570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:50,863-Speed 11621.86 samples/sec Loss 6.7849 LearningRate 0.0331 Epoch: 16 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:51,810-Speed 10814.62 samples/sec Loss 6.7713 LearningRate 0.0331 Epoch: 16 Global Step: 96590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:52,758-Speed 10817.88 samples/sec Loss 6.8138 LearningRate 0.0331 Epoch: 16 Global Step: 96600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:53,690-Speed 10993.46 samples/sec Loss 6.6433 LearningRate 0.0331 Epoch: 16 Global Step: 96610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:54,622-Speed 10986.69 samples/sec Loss 6.6298 LearningRate 0.0331 Epoch: 16 Global Step: 96620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:55,543-Speed 11124.38 samples/sec Loss 6.7483 LearningRate 0.0331 Epoch: 16 Global Step: 96630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:56,470-Speed 11059.13 samples/sec Loss 6.8256 LearningRate 0.0331 Epoch: 16 Global Step: 96640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:40:57,349-Speed 11657.55 samples/sec Loss 6.7707 LearningRate 0.0331 Epoch: 16 Global Step: 96650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:40:58,272-Speed 11096.89 samples/sec Loss 6.6741 LearningRate 0.0331 Epoch: 16 Global Step: 96660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:09,201-Speed 937.05 samples/sec Loss 6.2888 LearningRate 0.0331 Epoch: 17 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:10,138-Speed 10940.74 samples/sec Loss 6.0890 LearningRate 0.0331 Epoch: 17 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:11,106-Speed 10584.15 samples/sec Loss 6.0048 LearningRate 0.0330 Epoch: 17 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:12,302-Speed 8575.35 samples/sec Loss 6.0789 LearningRate 0.0330 Epoch: 17 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:13,278-Speed 10494.87 samples/sec Loss 6.0709 LearningRate 0.0330 Epoch: 17 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:14,187-Speed 11270.54 samples/sec Loss 6.0193 LearningRate 0.0330 Epoch: 17 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:15,175-Speed 10374.42 samples/sec Loss 5.8774 LearningRate 0.0330 Epoch: 17 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:16,101-Speed 11068.29 samples/sec Loss 6.0517 LearningRate 0.0330 Epoch: 17 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:17,077-Speed 10495.68 samples/sec Loss 6.0121 LearningRate 0.0330 Epoch: 17 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:17,949-Speed 11762.05 samples/sec Loss 6.0607 LearningRate 0.0330 Epoch: 17 Global Step: 96760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:18,853-Speed 11328.04 samples/sec Loss 6.0160 LearningRate 0.0330 Epoch: 17 Global Step: 96770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:19,761-Speed 11297.60 samples/sec Loss 6.1459 LearningRate 0.0330 Epoch: 17 Global Step: 96780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:20,708-Speed 10818.92 samples/sec Loss 6.0559 LearningRate 0.0330 Epoch: 17 Global Step: 96790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:21,583-Speed 11700.05 samples/sec Loss 6.0492 LearningRate 0.0330 Epoch: 17 Global Step: 96800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:22,460-Speed 11696.23 samples/sec Loss 6.1041 LearningRate 0.0330 Epoch: 17 Global Step: 96810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:23,355-Speed 11441.44 samples/sec Loss 6.0693 LearningRate 0.0330 Epoch: 17 Global Step: 96820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:24,306-Speed 10776.75 samples/sec Loss 6.2178 LearningRate 0.0330 Epoch: 17 Global Step: 96830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:25,228-Speed 11107.72 samples/sec Loss 6.1780 LearningRate 0.0330 Epoch: 17 Global Step: 96840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:26,146-Speed 11173.48 samples/sec Loss 6.2013 LearningRate 0.0330 Epoch: 17 Global Step: 96850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:27,061-Speed 11197.97 samples/sec Loss 6.2056 LearningRate 0.0330 Epoch: 17 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:27,941-Speed 11642.11 samples/sec Loss 6.1885 LearningRate 0.0330 Epoch: 17 Global Step: 96870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:28,837-Speed 11447.28 samples/sec Loss 6.1819 LearningRate 0.0330 Epoch: 17 Global Step: 96880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:29,775-Speed 10919.40 samples/sec Loss 6.0488 LearningRate 0.0329 Epoch: 17 Global Step: 96890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:30,672-Speed 11421.66 samples/sec Loss 6.1964 LearningRate 0.0329 Epoch: 17 Global Step: 96900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:31,555-Speed 11604.69 samples/sec Loss 6.2382 LearningRate 0.0329 Epoch: 17 Global Step: 96910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:32,482-Speed 11061.27 samples/sec Loss 6.2326 LearningRate 0.0329 Epoch: 17 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:33,358-Speed 11694.68 samples/sec Loss 6.1682 LearningRate 0.0329 Epoch: 17 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:34,236-Speed 11669.02 samples/sec Loss 6.1116 LearningRate 0.0329 Epoch: 17 Global Step: 96940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:35,130-Speed 11466.21 samples/sec Loss 6.2444 LearningRate 0.0329 Epoch: 17 Global Step: 96950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:35,998-Speed 11799.36 samples/sec Loss 6.2207 LearningRate 0.0329 Epoch: 17 Global Step: 96960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:36,928-Speed 11019.15 samples/sec Loss 6.2092 LearningRate 0.0329 Epoch: 17 Global Step: 96970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:37,819-Speed 11501.89 samples/sec Loss 6.2715 LearningRate 0.0329 Epoch: 17 Global Step: 96980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:38,761-Speed 10881.17 samples/sec Loss 6.0923 LearningRate 0.0329 Epoch: 17 Global Step: 96990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:39,646-Speed 11578.65 samples/sec Loss 6.2753 LearningRate 0.0329 Epoch: 17 Global Step: 97000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:40,555-Speed 11265.06 samples/sec Loss 6.1982 LearningRate 0.0329 Epoch: 17 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:41,482-Speed 11060.78 samples/sec Loss 6.2013 LearningRate 0.0329 Epoch: 17 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:42,424-Speed 10873.29 samples/sec Loss 6.3796 LearningRate 0.0329 Epoch: 17 Global Step: 97030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:43,360-Speed 10961.44 samples/sec Loss 6.2655 LearningRate 0.0329 Epoch: 17 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:44,275-Speed 11198.61 samples/sec Loss 6.3191 LearningRate 0.0329 Epoch: 17 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:45,167-Speed 11487.47 samples/sec Loss 6.3134 LearningRate 0.0329 Epoch: 17 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:46,041-Speed 11725.22 samples/sec Loss 6.2144 LearningRate 0.0329 Epoch: 17 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:46,933-Speed 11479.15 samples/sec Loss 6.3226 LearningRate 0.0329 Epoch: 17 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:47,824-Speed 11503.16 samples/sec Loss 6.2578 LearningRate 0.0328 Epoch: 17 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:48,705-Speed 11628.23 samples/sec Loss 6.3065 LearningRate 0.0328 Epoch: 17 Global Step: 97100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:49,596-Speed 11509.76 samples/sec Loss 6.1026 LearningRate 0.0328 Epoch: 17 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:50,537-Speed 10886.11 samples/sec Loss 6.3022 LearningRate 0.0328 Epoch: 17 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:51,480-Speed 10858.32 samples/sec Loss 6.3441 LearningRate 0.0328 Epoch: 17 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:52,364-Speed 11594.07 samples/sec Loss 6.3676 LearningRate 0.0328 Epoch: 17 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:53,246-Speed 11623.84 samples/sec Loss 6.3206 LearningRate 0.0328 Epoch: 17 Global Step: 97150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:54,150-Speed 11329.33 samples/sec Loss 6.4399 LearningRate 0.0328 Epoch: 17 Global Step: 97160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:55,037-Speed 11559.59 samples/sec Loss 6.3926 LearningRate 0.0328 Epoch: 17 Global Step: 97170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:56,108-Speed 9558.54 samples/sec Loss 6.3964 LearningRate 0.0328 Epoch: 17 Global Step: 97180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:57,017-Speed 11276.67 samples/sec Loss 6.3848 LearningRate 0.0328 Epoch: 17 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:57,894-Speed 11678.22 samples/sec Loss 6.2750 LearningRate 0.0328 Epoch: 17 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:41:58,969-Speed 9531.35 samples/sec Loss 6.1301 LearningRate 0.0328 Epoch: 17 Global Step: 97210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:41:59,906-Speed 10942.06 samples/sec Loss 6.2842 LearningRate 0.0328 Epoch: 17 Global Step: 97220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:00,789-Speed 11614.12 samples/sec Loss 6.3623 LearningRate 0.0328 Epoch: 17 Global Step: 97230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:01,806-Speed 10075.51 samples/sec Loss 6.2550 LearningRate 0.0328 Epoch: 17 Global Step: 97240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:02,780-Speed 10512.77 samples/sec Loss 6.3969 LearningRate 0.0328 Epoch: 17 Global Step: 97250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:03,743-Speed 10640.56 samples/sec Loss 6.3341 LearningRate 0.0328 Epoch: 17 Global Step: 97260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:04,640-Speed 11424.33 samples/sec Loss 6.3024 LearningRate 0.0328 Epoch: 17 Global Step: 97270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:05,566-Speed 11069.87 samples/sec Loss 6.2850 LearningRate 0.0328 Epoch: 17 Global Step: 97280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:06,449-Speed 11606.86 samples/sec Loss 6.3795 LearningRate 0.0327 Epoch: 17 Global Step: 97290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:07,356-Speed 11293.26 samples/sec Loss 6.1992 LearningRate 0.0327 Epoch: 17 Global Step: 97300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:08,224-Speed 11809.58 samples/sec Loss 6.2430 LearningRate 0.0327 Epoch: 17 Global Step: 97310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:09,110-Speed 11559.23 samples/sec Loss 6.2960 LearningRate 0.0327 Epoch: 17 Global Step: 97320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:09,983-Speed 11742.20 samples/sec Loss 6.3514 LearningRate 0.0327 Epoch: 17 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:10,882-Speed 11397.63 samples/sec Loss 6.5314 LearningRate 0.0327 Epoch: 17 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:11,796-Speed 11206.82 samples/sec Loss 6.2289 LearningRate 0.0327 Epoch: 17 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:12,706-Speed 11266.81 samples/sec Loss 6.4836 LearningRate 0.0327 Epoch: 17 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:13,593-Speed 11550.84 samples/sec Loss 6.4151 LearningRate 0.0327 Epoch: 17 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:14,506-Speed 11225.51 samples/sec Loss 6.3774 LearningRate 0.0327 Epoch: 17 Global Step: 97380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:15,458-Speed 10760.03 samples/sec Loss 6.2739 LearningRate 0.0327 Epoch: 17 Global Step: 97390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:16,356-Speed 11404.90 samples/sec Loss 6.4171 LearningRate 0.0327 Epoch: 17 Global Step: 97400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:17,275-Speed 11156.27 samples/sec Loss 6.4049 LearningRate 0.0327 Epoch: 17 Global Step: 97410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:18,178-Speed 11349.88 samples/sec Loss 6.3700 LearningRate 0.0327 Epoch: 17 Global Step: 97420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:19,110-Speed 10999.37 samples/sec Loss 6.3875 LearningRate 0.0327 Epoch: 17 Global Step: 97430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:20,000-Speed 11511.61 samples/sec Loss 6.4378 LearningRate 0.0327 Epoch: 17 Global Step: 97440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:20,873-Speed 11766.66 samples/sec Loss 6.4457 LearningRate 0.0327 Epoch: 17 Global Step: 97450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:21,753-Speed 11638.86 samples/sec Loss 6.3636 LearningRate 0.0327 Epoch: 17 Global Step: 97460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:22,630-Speed 11689.77 samples/sec Loss 6.4113 LearningRate 0.0327 Epoch: 17 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:23,523-Speed 11466.16 samples/sec Loss 6.4443 LearningRate 0.0327 Epoch: 17 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:24,400-Speed 11694.90 samples/sec Loss 6.3913 LearningRate 0.0326 Epoch: 17 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:25,304-Speed 11332.76 samples/sec Loss 6.3910 LearningRate 0.0326 Epoch: 17 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:26,192-Speed 11538.23 samples/sec Loss 6.4455 LearningRate 0.0326 Epoch: 17 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:27,133-Speed 10888.27 samples/sec Loss 6.3406 LearningRate 0.0326 Epoch: 17 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:28,071-Speed 10924.70 samples/sec Loss 6.3220 LearningRate 0.0326 Epoch: 17 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:29,024-Speed 10746.47 samples/sec Loss 6.4107 LearningRate 0.0326 Epoch: 17 Global Step: 97540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:29,921-Speed 11424.52 samples/sec Loss 6.4000 LearningRate 0.0326 Epoch: 17 Global Step: 97550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:30,822-Speed 11382.44 samples/sec Loss 6.4093 LearningRate 0.0326 Epoch: 17 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:31,728-Speed 11303.52 samples/sec Loss 6.4438 LearningRate 0.0326 Epoch: 17 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:32,618-Speed 11513.69 samples/sec Loss 6.4668 LearningRate 0.0326 Epoch: 17 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:33,486-Speed 11805.73 samples/sec Loss 6.4422 LearningRate 0.0326 Epoch: 17 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:34,372-Speed 11566.08 samples/sec Loss 6.3781 LearningRate 0.0326 Epoch: 17 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:35,250-Speed 11663.76 samples/sec Loss 6.4044 LearningRate 0.0326 Epoch: 17 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:36,169-Speed 11159.06 samples/sec Loss 6.2531 LearningRate 0.0326 Epoch: 17 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:37,055-Speed 11561.14 samples/sec Loss 6.3674 LearningRate 0.0326 Epoch: 17 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:37,941-Speed 11568.36 samples/sec Loss 6.4154 LearningRate 0.0326 Epoch: 17 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:38,859-Speed 11163.77 samples/sec Loss 6.4990 LearningRate 0.0326 Epoch: 17 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:42:39,764-Speed 11319.67 samples/sec Loss 6.3466 LearningRate 0.0326 Epoch: 17 Global Step: 97660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:40,691-Speed 11048.14 samples/sec Loss 6.3342 LearningRate 0.0326 Epoch: 17 Global Step: 97670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:41,593-Speed 11365.78 samples/sec Loss 6.5923 LearningRate 0.0325 Epoch: 17 Global Step: 97680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:42,521-Speed 11049.78 samples/sec Loss 6.4636 LearningRate 0.0325 Epoch: 17 Global Step: 97690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:43,443-Speed 11119.76 samples/sec Loss 6.3346 LearningRate 0.0325 Epoch: 17 Global Step: 97700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:44,380-Speed 10936.11 samples/sec Loss 6.5125 LearningRate 0.0325 Epoch: 17 Global Step: 97710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:45,266-Speed 11555.18 samples/sec Loss 6.3373 LearningRate 0.0325 Epoch: 17 Global Step: 97720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:46,144-Speed 11680.44 samples/sec Loss 6.4396 LearningRate 0.0325 Epoch: 17 Global Step: 97730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:47,043-Speed 11399.70 samples/sec Loss 6.2347 LearningRate 0.0325 Epoch: 17 Global Step: 97740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:47,932-Speed 11519.47 samples/sec Loss 6.4223 LearningRate 0.0325 Epoch: 17 Global Step: 97750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:48,822-Speed 11513.25 samples/sec Loss 6.4176 LearningRate 0.0325 Epoch: 17 Global Step: 97760 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:42:49,699-Speed 11686.80 samples/sec Loss 6.5576 LearningRate 0.0325 Epoch: 17 Global Step: 97770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:50,629-Speed 11021.15 samples/sec Loss 6.5129 LearningRate 0.0325 Epoch: 17 Global Step: 97780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:51,547-Speed 11157.49 samples/sec Loss 6.5216 LearningRate 0.0325 Epoch: 17 Global Step: 97790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:52,445-Speed 11409.66 samples/sec Loss 6.3772 LearningRate 0.0325 Epoch: 17 Global Step: 97800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:53,396-Speed 10778.96 samples/sec Loss 6.4731 LearningRate 0.0325 Epoch: 17 Global Step: 97810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:54,310-Speed 11202.35 samples/sec Loss 6.4868 LearningRate 0.0325 Epoch: 17 Global Step: 97820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:55,227-Speed 11182.72 samples/sec Loss 6.3099 LearningRate 0.0325 Epoch: 17 Global Step: 97830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:56,150-Speed 11099.91 samples/sec Loss 6.4951 LearningRate 0.0325 Epoch: 17 Global Step: 97840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:57,052-Speed 11356.66 samples/sec Loss 6.6024 LearningRate 0.0325 Epoch: 17 Global Step: 97850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:57,931-Speed 11654.36 samples/sec Loss 6.4652 LearningRate 0.0325 Epoch: 17 Global Step: 97860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:58,810-Speed 11665.76 samples/sec Loss 6.3841 LearningRate 0.0325 Epoch: 17 Global Step: 97870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:42:59,706-Speed 11437.54 samples/sec Loss 6.4826 LearningRate 0.0324 Epoch: 17 Global Step: 97880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:43:00,618-Speed 11233.14 samples/sec Loss 6.4639 LearningRate 0.0324 Epoch: 17 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:01,537-Speed 11155.65 samples/sec Loss 6.4572 LearningRate 0.0324 Epoch: 17 Global Step: 97900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:02,450-Speed 11220.35 samples/sec Loss 6.4039 LearningRate 0.0324 Epoch: 17 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:03,379-Speed 11033.08 samples/sec Loss 6.4729 LearningRate 0.0324 Epoch: 17 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:04,292-Speed 11220.59 samples/sec Loss 6.5096 LearningRate 0.0324 Epoch: 17 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:05,168-Speed 11694.32 samples/sec Loss 6.5976 LearningRate 0.0324 Epoch: 17 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:06,080-Speed 11235.89 samples/sec Loss 6.4265 LearningRate 0.0324 Epoch: 17 Global Step: 97950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:07,035-Speed 10738.12 samples/sec Loss 6.3816 LearningRate 0.0324 Epoch: 17 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:07,988-Speed 10744.08 samples/sec Loss 6.5982 LearningRate 0.0324 Epoch: 17 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:08,869-Speed 11638.37 samples/sec Loss 6.4277 LearningRate 0.0324 Epoch: 17 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:43:09,745-Speed 11689.58 samples/sec Loss 6.5117 LearningRate 0.0324 Epoch: 17 Global Step: 97990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:43:10,639-Speed 11464.08 samples/sec Loss 6.5054 LearningRate 0.0324 Epoch: 17 Global Step: 98000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:43:32,027-[lfw][98000]XNorm: 11.045378 Training: 2022-04-11 02:43:32,028-[lfw][98000]Accuracy-Flip: 0.99483+-0.00369 Training: 2022-04-11 02:43:32,028-[lfw][98000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:43:56,784-[cfp_fp][98000]XNorm: 9.322960 Training: 2022-04-11 02:43:56,785-[cfp_fp][98000]Accuracy-Flip: 0.94000+-0.01116 Training: 2022-04-11 02:43:56,785-[cfp_fp][98000]Accuracy-Highest: 0.94443 Training: 2022-04-11 02:44:18,039-[agedb_30][98000]XNorm: 10.832189 Training: 2022-04-11 02:44:18,039-[agedb_30][98000]Accuracy-Flip: 0.96250+-0.00834 Training: 2022-04-11 02:44:18,040-[agedb_30][98000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:44:18,943-Speed 149.92 samples/sec Loss 6.5452 LearningRate 0.0324 Epoch: 17 Global Step: 98010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:19,862-Speed 11141.98 samples/sec Loss 6.5436 LearningRate 0.0324 Epoch: 17 Global Step: 98020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:20,735-Speed 11742.76 samples/sec Loss 6.4743 LearningRate 0.0324 Epoch: 17 Global Step: 98030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:21,605-Speed 11773.27 samples/sec Loss 6.4254 LearningRate 0.0324 Epoch: 17 Global Step: 98040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:22,488-Speed 11605.75 samples/sec Loss 6.4978 LearningRate 0.0324 Epoch: 17 Global Step: 98050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:23,386-Speed 11414.90 samples/sec Loss 6.5140 LearningRate 0.0324 Epoch: 17 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:24,264-Speed 11673.11 samples/sec Loss 6.6588 LearningRate 0.0324 Epoch: 17 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:25,146-Speed 11608.82 samples/sec Loss 6.5650 LearningRate 0.0323 Epoch: 17 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:26,062-Speed 11189.94 samples/sec Loss 6.5205 LearningRate 0.0323 Epoch: 17 Global Step: 98090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:26,967-Speed 11328.21 samples/sec Loss 6.4767 LearningRate 0.0323 Epoch: 17 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:27,866-Speed 11396.45 samples/sec Loss 6.6382 LearningRate 0.0323 Epoch: 17 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:28,744-Speed 11669.75 samples/sec Loss 6.5637 LearningRate 0.0323 Epoch: 17 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:29,657-Speed 11240.17 samples/sec Loss 6.5874 LearningRate 0.0323 Epoch: 17 Global Step: 98130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:30,545-Speed 11544.65 samples/sec Loss 6.4107 LearningRate 0.0323 Epoch: 17 Global Step: 98140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:31,442-Speed 11427.66 samples/sec Loss 6.4324 LearningRate 0.0323 Epoch: 17 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:32,338-Speed 11431.78 samples/sec Loss 6.6225 LearningRate 0.0323 Epoch: 17 Global Step: 98160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:33,214-Speed 11700.19 samples/sec Loss 6.5006 LearningRate 0.0323 Epoch: 17 Global Step: 98170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:34,113-Speed 11406.24 samples/sec Loss 6.4451 LearningRate 0.0323 Epoch: 17 Global Step: 98180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:35,000-Speed 11553.98 samples/sec Loss 6.6074 LearningRate 0.0323 Epoch: 17 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:35,889-Speed 11528.34 samples/sec Loss 6.5837 LearningRate 0.0323 Epoch: 17 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:36,777-Speed 11538.80 samples/sec Loss 6.4567 LearningRate 0.0323 Epoch: 17 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:37,681-Speed 11329.58 samples/sec Loss 6.5887 LearningRate 0.0323 Epoch: 17 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:38,610-Speed 11036.87 samples/sec Loss 6.3936 LearningRate 0.0323 Epoch: 17 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:39,565-Speed 10726.38 samples/sec Loss 6.4338 LearningRate 0.0323 Epoch: 17 Global Step: 98240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:40,451-Speed 11559.51 samples/sec Loss 6.5379 LearningRate 0.0323 Epoch: 17 Global Step: 98250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:41,370-Speed 11151.81 samples/sec Loss 6.5063 LearningRate 0.0323 Epoch: 17 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:42,254-Speed 11587.39 samples/sec Loss 6.4834 LearningRate 0.0323 Epoch: 17 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:43,152-Speed 11417.94 samples/sec Loss 6.4559 LearningRate 0.0322 Epoch: 17 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:44,085-Speed 10989.01 samples/sec Loss 6.6508 LearningRate 0.0322 Epoch: 17 Global Step: 98290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:44,952-Speed 11818.26 samples/sec Loss 6.4146 LearningRate 0.0322 Epoch: 17 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:45,856-Speed 11335.10 samples/sec Loss 6.5390 LearningRate 0.0322 Epoch: 17 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:46,760-Speed 11326.13 samples/sec Loss 6.5452 LearningRate 0.0322 Epoch: 17 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:47,638-Speed 11678.35 samples/sec Loss 6.6079 LearningRate 0.0322 Epoch: 17 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:48,532-Speed 11460.90 samples/sec Loss 6.6670 LearningRate 0.0322 Epoch: 17 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:49,466-Speed 10968.33 samples/sec Loss 6.4738 LearningRate 0.0322 Epoch: 17 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:50,415-Speed 10804.52 samples/sec Loss 6.5667 LearningRate 0.0322 Epoch: 17 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:51,331-Speed 11186.63 samples/sec Loss 6.5022 LearningRate 0.0322 Epoch: 17 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:52,287-Speed 10711.33 samples/sec Loss 6.5565 LearningRate 0.0322 Epoch: 17 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:53,234-Speed 10824.90 samples/sec Loss 6.7521 LearningRate 0.0322 Epoch: 17 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:44:54,187-Speed 10748.82 samples/sec Loss 6.5683 LearningRate 0.0322 Epoch: 17 Global Step: 98400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:55,063-Speed 11693.71 samples/sec Loss 6.6311 LearningRate 0.0322 Epoch: 17 Global Step: 98410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:56,010-Speed 10826.13 samples/sec Loss 6.6799 LearningRate 0.0322 Epoch: 17 Global Step: 98420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:56,901-Speed 11496.46 samples/sec Loss 6.4393 LearningRate 0.0322 Epoch: 17 Global Step: 98430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:57,785-Speed 11589.82 samples/sec Loss 6.5311 LearningRate 0.0322 Epoch: 17 Global Step: 98440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:58,674-Speed 11533.27 samples/sec Loss 6.3862 LearningRate 0.0322 Epoch: 17 Global Step: 98450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:44:59,562-Speed 11535.24 samples/sec Loss 6.7089 LearningRate 0.0322 Epoch: 17 Global Step: 98460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:00,501-Speed 10915.96 samples/sec Loss 6.5286 LearningRate 0.0322 Epoch: 17 Global Step: 98470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:01,404-Speed 11344.97 samples/sec Loss 6.5640 LearningRate 0.0321 Epoch: 17 Global Step: 98480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:02,401-Speed 10284.19 samples/sec Loss 6.5226 LearningRate 0.0321 Epoch: 17 Global Step: 98490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:03,318-Speed 11177.62 samples/sec Loss 6.6110 LearningRate 0.0321 Epoch: 17 Global Step: 98500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:04,216-Speed 11412.07 samples/sec Loss 6.5727 LearningRate 0.0321 Epoch: 17 Global Step: 98510 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:05,132-Speed 11190.69 samples/sec Loss 6.5521 LearningRate 0.0321 Epoch: 17 Global Step: 98520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:06,066-Speed 10971.69 samples/sec Loss 6.6797 LearningRate 0.0321 Epoch: 17 Global Step: 98530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:06,976-Speed 11256.30 samples/sec Loss 6.4151 LearningRate 0.0321 Epoch: 17 Global Step: 98540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:07,898-Speed 11114.26 samples/sec Loss 6.5288 LearningRate 0.0321 Epoch: 17 Global Step: 98550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:08,789-Speed 11509.21 samples/sec Loss 6.6553 LearningRate 0.0321 Epoch: 17 Global Step: 98560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:09,668-Speed 11670.95 samples/sec Loss 6.7400 LearningRate 0.0321 Epoch: 17 Global Step: 98570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:10,536-Speed 11801.73 samples/sec Loss 6.5091 LearningRate 0.0321 Epoch: 17 Global Step: 98580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:11,418-Speed 11608.56 samples/sec Loss 6.6273 LearningRate 0.0321 Epoch: 17 Global Step: 98590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:12,299-Speed 11634.02 samples/sec Loss 6.6332 LearningRate 0.0321 Epoch: 17 Global Step: 98600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:13,205-Speed 11318.00 samples/sec Loss 6.7137 LearningRate 0.0321 Epoch: 17 Global Step: 98610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:14,106-Speed 11366.76 samples/sec Loss 6.5680 LearningRate 0.0321 Epoch: 17 Global Step: 98620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:15,005-Speed 11396.41 samples/sec Loss 6.6090 LearningRate 0.0321 Epoch: 17 Global Step: 98630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:15,899-Speed 11459.50 samples/sec Loss 6.4339 LearningRate 0.0321 Epoch: 17 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:16,787-Speed 11542.42 samples/sec Loss 6.5123 LearningRate 0.0321 Epoch: 17 Global Step: 98650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:17,694-Speed 11290.49 samples/sec Loss 6.6684 LearningRate 0.0321 Epoch: 17 Global Step: 98660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:18,614-Speed 11140.99 samples/sec Loss 6.6376 LearningRate 0.0321 Epoch: 17 Global Step: 98670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:19,503-Speed 11536.57 samples/sec Loss 6.6396 LearningRate 0.0321 Epoch: 17 Global Step: 98680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:20,401-Speed 11412.89 samples/sec Loss 6.6698 LearningRate 0.0320 Epoch: 17 Global Step: 98690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:21,287-Speed 11560.59 samples/sec Loss 6.6712 LearningRate 0.0320 Epoch: 17 Global Step: 98700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:22,163-Speed 11698.30 samples/sec Loss 6.6671 LearningRate 0.0320 Epoch: 17 Global Step: 98710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:23,060-Speed 11421.49 samples/sec Loss 6.5990 LearningRate 0.0320 Epoch: 17 Global Step: 98720 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:23,956-Speed 11432.16 samples/sec Loss 6.5873 LearningRate 0.0320 Epoch: 17 Global Step: 98730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:24,854-Speed 11419.07 samples/sec Loss 6.5509 LearningRate 0.0320 Epoch: 17 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:25,747-Speed 11468.91 samples/sec Loss 6.4983 LearningRate 0.0320 Epoch: 17 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:26,665-Speed 11172.84 samples/sec Loss 6.5923 LearningRate 0.0320 Epoch: 17 Global Step: 98760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:27,562-Speed 11421.11 samples/sec Loss 6.6313 LearningRate 0.0320 Epoch: 17 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:28,471-Speed 11271.74 samples/sec Loss 6.5943 LearningRate 0.0320 Epoch: 17 Global Step: 98780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:29,376-Speed 11317.74 samples/sec Loss 6.5590 LearningRate 0.0320 Epoch: 17 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:30,299-Speed 11101.46 samples/sec Loss 6.5523 LearningRate 0.0320 Epoch: 17 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:31,183-Speed 11607.81 samples/sec Loss 6.8903 LearningRate 0.0320 Epoch: 17 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:32,081-Speed 11403.09 samples/sec Loss 6.6998 LearningRate 0.0320 Epoch: 17 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:32,959-Speed 11673.79 samples/sec Loss 6.7100 LearningRate 0.0320 Epoch: 17 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:33,834-Speed 11706.31 samples/sec Loss 6.4857 LearningRate 0.0320 Epoch: 17 Global Step: 98840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:34,733-Speed 11402.54 samples/sec Loss 6.6711 LearningRate 0.0320 Epoch: 17 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:35,621-Speed 11532.22 samples/sec Loss 6.5546 LearningRate 0.0320 Epoch: 17 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:36,568-Speed 10824.88 samples/sec Loss 6.6090 LearningRate 0.0320 Epoch: 17 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:37,473-Speed 11330.26 samples/sec Loss 6.5148 LearningRate 0.0320 Epoch: 17 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:38,389-Speed 11177.43 samples/sec Loss 6.5606 LearningRate 0.0319 Epoch: 17 Global Step: 98890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:39,313-Speed 11092.72 samples/sec Loss 6.6281 LearningRate 0.0319 Epoch: 17 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:40,196-Speed 11602.53 samples/sec Loss 6.6049 LearningRate 0.0319 Epoch: 17 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:41,079-Speed 11608.45 samples/sec Loss 6.6872 LearningRate 0.0319 Epoch: 17 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:42,031-Speed 10764.52 samples/sec Loss 6.5980 LearningRate 0.0319 Epoch: 17 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:42,962-Speed 10998.75 samples/sec Loss 6.5846 LearningRate 0.0319 Epoch: 17 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:45:43,872-Speed 11263.02 samples/sec Loss 6.5003 LearningRate 0.0319 Epoch: 17 Global Step: 98950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:44,755-Speed 11607.90 samples/sec Loss 6.6498 LearningRate 0.0319 Epoch: 17 Global Step: 98960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:45,632-Speed 11679.54 samples/sec Loss 6.6374 LearningRate 0.0319 Epoch: 17 Global Step: 98970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:46,519-Speed 11545.54 samples/sec Loss 6.4800 LearningRate 0.0319 Epoch: 17 Global Step: 98980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:47,465-Speed 10838.42 samples/sec Loss 6.5434 LearningRate 0.0319 Epoch: 17 Global Step: 98990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:48,354-Speed 11531.44 samples/sec Loss 6.5830 LearningRate 0.0319 Epoch: 17 Global Step: 99000 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:49,306-Speed 10760.12 samples/sec Loss 6.6322 LearningRate 0.0319 Epoch: 17 Global Step: 99010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:50,214-Speed 11287.69 samples/sec Loss 6.5997 LearningRate 0.0319 Epoch: 17 Global Step: 99020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:51,111-Speed 11427.59 samples/sec Loss 6.6164 LearningRate 0.0319 Epoch: 17 Global Step: 99030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:52,018-Speed 11293.94 samples/sec Loss 6.4577 LearningRate 0.0319 Epoch: 17 Global Step: 99040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:52,973-Speed 10731.89 samples/sec Loss 6.6231 LearningRate 0.0319 Epoch: 17 Global Step: 99050 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:45:53,901-Speed 11045.97 samples/sec Loss 6.5630 LearningRate 0.0319 Epoch: 17 Global Step: 99060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:54,796-Speed 11442.36 samples/sec Loss 6.6592 LearningRate 0.0319 Epoch: 17 Global Step: 99070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:55,726-Speed 11029.21 samples/sec Loss 6.5217 LearningRate 0.0319 Epoch: 17 Global Step: 99080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:56,635-Speed 11268.94 samples/sec Loss 6.4924 LearningRate 0.0318 Epoch: 17 Global Step: 99090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:57,514-Speed 11655.14 samples/sec Loss 6.4184 LearningRate 0.0318 Epoch: 17 Global Step: 99100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:58,394-Speed 11642.27 samples/sec Loss 6.5280 LearningRate 0.0318 Epoch: 17 Global Step: 99110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:45:59,265-Speed 11758.79 samples/sec Loss 6.7071 LearningRate 0.0318 Epoch: 17 Global Step: 99120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:00,145-Speed 11648.82 samples/sec Loss 6.7715 LearningRate 0.0318 Epoch: 17 Global Step: 99130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:01,032-Speed 11549.72 samples/sec Loss 6.6004 LearningRate 0.0318 Epoch: 17 Global Step: 99140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:01,959-Speed 11058.27 samples/sec Loss 6.5342 LearningRate 0.0318 Epoch: 17 Global Step: 99150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:02,870-Speed 11245.54 samples/sec Loss 6.6483 LearningRate 0.0318 Epoch: 17 Global Step: 99160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:03,797-Speed 11054.55 samples/sec Loss 6.6383 LearningRate 0.0318 Epoch: 17 Global Step: 99170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:04,700-Speed 11353.77 samples/sec Loss 6.5206 LearningRate 0.0318 Epoch: 17 Global Step: 99180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:05,627-Speed 11047.38 samples/sec Loss 6.5664 LearningRate 0.0318 Epoch: 17 Global Step: 99190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:06,517-Speed 11521.78 samples/sec Loss 6.6130 LearningRate 0.0318 Epoch: 17 Global Step: 99200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:07,457-Speed 10898.00 samples/sec Loss 6.5098 LearningRate 0.0318 Epoch: 17 Global Step: 99210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:08,375-Speed 11166.47 samples/sec Loss 6.6194 LearningRate 0.0318 Epoch: 17 Global Step: 99220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:09,268-Speed 11474.55 samples/sec Loss 6.6804 LearningRate 0.0318 Epoch: 17 Global Step: 99230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:10,141-Speed 11730.82 samples/sec Loss 6.7447 LearningRate 0.0318 Epoch: 17 Global Step: 99240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:11,052-Speed 11259.58 samples/sec Loss 6.6122 LearningRate 0.0318 Epoch: 17 Global Step: 99250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:11,968-Speed 11181.61 samples/sec Loss 6.6135 LearningRate 0.0318 Epoch: 17 Global Step: 99260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:12,872-Speed 11349.30 samples/sec Loss 6.5400 LearningRate 0.0318 Epoch: 17 Global Step: 99270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:13,795-Speed 11107.93 samples/sec Loss 6.5114 LearningRate 0.0318 Epoch: 17 Global Step: 99280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:14,683-Speed 11534.74 samples/sec Loss 6.6670 LearningRate 0.0317 Epoch: 17 Global Step: 99290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:15,587-Speed 11341.22 samples/sec Loss 6.6151 LearningRate 0.0317 Epoch: 17 Global Step: 99300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:16,504-Speed 11169.57 samples/sec Loss 6.5257 LearningRate 0.0317 Epoch: 17 Global Step: 99310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:17,455-Speed 10780.31 samples/sec Loss 6.5526 LearningRate 0.0317 Epoch: 17 Global Step: 99320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:18,404-Speed 10792.56 samples/sec Loss 6.4780 LearningRate 0.0317 Epoch: 17 Global Step: 99330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:19,322-Speed 11163.40 samples/sec Loss 6.6347 LearningRate 0.0317 Epoch: 17 Global Step: 99340 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:20,205-Speed 11610.79 samples/sec Loss 6.6991 LearningRate 0.0317 Epoch: 17 Global Step: 99350 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:21,095-Speed 11522.56 samples/sec Loss 6.5136 LearningRate 0.0317 Epoch: 17 Global Step: 99360 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:46:21,977-Speed 11614.42 samples/sec Loss 6.5732 LearningRate 0.0317 Epoch: 17 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:22,863-Speed 11558.80 samples/sec Loss 6.5225 LearningRate 0.0317 Epoch: 17 Global Step: 99380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:23,763-Speed 11390.39 samples/sec Loss 6.5108 LearningRate 0.0317 Epoch: 17 Global Step: 99390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:24,659-Speed 11437.51 samples/sec Loss 6.6698 LearningRate 0.0317 Epoch: 17 Global Step: 99400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:25,541-Speed 11621.61 samples/sec Loss 6.6790 LearningRate 0.0317 Epoch: 17 Global Step: 99410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:26,454-Speed 11216.89 samples/sec Loss 6.6507 LearningRate 0.0317 Epoch: 17 Global Step: 99420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:27,386-Speed 10999.55 samples/sec Loss 6.6583 LearningRate 0.0317 Epoch: 17 Global Step: 99430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:28,285-Speed 11395.56 samples/sec Loss 6.6344 LearningRate 0.0317 Epoch: 17 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:29,187-Speed 11358.06 samples/sec Loss 6.6205 LearningRate 0.0317 Epoch: 17 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:30,092-Speed 11321.49 samples/sec Loss 6.6150 LearningRate 0.0317 Epoch: 17 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:31,026-Speed 10984.30 samples/sec Loss 6.5254 LearningRate 0.0317 Epoch: 17 Global Step: 99470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:31,916-Speed 11504.65 samples/sec Loss 6.6738 LearningRate 0.0317 Epoch: 17 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:32,805-Speed 11534.32 samples/sec Loss 6.6943 LearningRate 0.0316 Epoch: 17 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:33,694-Speed 11521.42 samples/sec Loss 6.7565 LearningRate 0.0316 Epoch: 17 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:34,610-Speed 11190.34 samples/sec Loss 6.6483 LearningRate 0.0316 Epoch: 17 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:35,507-Speed 11427.58 samples/sec Loss 6.7298 LearningRate 0.0316 Epoch: 17 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:36,409-Speed 11352.53 samples/sec Loss 6.6505 LearningRate 0.0316 Epoch: 17 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:37,340-Speed 11011.11 samples/sec Loss 6.7369 LearningRate 0.0316 Epoch: 17 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:38,243-Speed 11353.51 samples/sec Loss 6.6661 LearningRate 0.0316 Epoch: 17 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:39,126-Speed 11597.74 samples/sec Loss 6.4750 LearningRate 0.0316 Epoch: 17 Global Step: 99560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:40,037-Speed 11247.26 samples/sec Loss 6.5523 LearningRate 0.0316 Epoch: 17 Global Step: 99570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:40,931-Speed 11456.55 samples/sec Loss 6.6531 LearningRate 0.0316 Epoch: 17 Global Step: 99580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:41,876-Speed 10852.26 samples/sec Loss 6.5819 LearningRate 0.0316 Epoch: 17 Global Step: 99590 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:42,817-Speed 10886.62 samples/sec Loss 6.7114 LearningRate 0.0316 Epoch: 17 Global Step: 99600 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:43,748-Speed 11008.54 samples/sec Loss 6.6709 LearningRate 0.0316 Epoch: 17 Global Step: 99610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:44,648-Speed 11384.96 samples/sec Loss 6.4816 LearningRate 0.0316 Epoch: 17 Global Step: 99620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:45,523-Speed 11704.68 samples/sec Loss 6.5205 LearningRate 0.0316 Epoch: 17 Global Step: 99630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:46,424-Speed 11370.52 samples/sec Loss 6.4873 LearningRate 0.0316 Epoch: 17 Global Step: 99640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:47,319-Speed 11445.13 samples/sec Loss 6.6767 LearningRate 0.0316 Epoch: 17 Global Step: 99650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:48,231-Speed 11237.02 samples/sec Loss 6.4729 LearningRate 0.0316 Epoch: 17 Global Step: 99660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:49,144-Speed 11223.09 samples/sec Loss 6.7222 LearningRate 0.0316 Epoch: 17 Global Step: 99670 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:50,028-Speed 11602.50 samples/sec Loss 6.6698 LearningRate 0.0316 Epoch: 17 Global Step: 99680 Fp16 Grad Scale: 262144 Required: 5 hours Training: 2022-04-11 02:46:50,899-Speed 11765.18 samples/sec Loss 6.5319 LearningRate 0.0315 Epoch: 17 Global Step: 99690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:51,798-Speed 11388.82 samples/sec Loss 6.5468 LearningRate 0.0315 Epoch: 17 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:52,691-Speed 11486.49 samples/sec Loss 6.5605 LearningRate 0.0315 Epoch: 17 Global Step: 99710 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:46:53,586-Speed 11447.94 samples/sec Loss 6.5790 LearningRate 0.0315 Epoch: 17 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:54,482-Speed 11430.74 samples/sec Loss 6.4970 LearningRate 0.0315 Epoch: 17 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:55,361-Speed 11660.98 samples/sec Loss 6.5451 LearningRate 0.0315 Epoch: 17 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:56,273-Speed 11238.99 samples/sec Loss 6.6455 LearningRate 0.0315 Epoch: 17 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:57,152-Speed 11661.57 samples/sec Loss 6.7195 LearningRate 0.0315 Epoch: 17 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:58,028-Speed 11688.95 samples/sec Loss 6.4200 LearningRate 0.0315 Epoch: 17 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:58,964-Speed 10946.80 samples/sec Loss 6.6662 LearningRate 0.0315 Epoch: 17 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:46:59,849-Speed 11578.17 samples/sec Loss 6.5679 LearningRate 0.0315 Epoch: 17 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:00,743-Speed 11471.14 samples/sec Loss 6.4313 LearningRate 0.0315 Epoch: 17 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:01,675-Speed 10986.18 samples/sec Loss 6.7544 LearningRate 0.0315 Epoch: 17 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:02,570-Speed 11458.72 samples/sec Loss 6.5724 LearningRate 0.0315 Epoch: 17 Global Step: 99820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:47:03,462-Speed 11480.52 samples/sec Loss 6.7644 LearningRate 0.0315 Epoch: 17 Global Step: 99830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:47:04,423-Speed 10666.45 samples/sec Loss 6.7144 LearningRate 0.0315 Epoch: 17 Global Step: 99840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:47:05,368-Speed 10841.49 samples/sec Loss 6.5256 LearningRate 0.0315 Epoch: 17 Global Step: 99850 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:47:06,333-Speed 10614.46 samples/sec Loss 6.6606 LearningRate 0.0315 Epoch: 17 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:07,286-Speed 10752.53 samples/sec Loss 6.7440 LearningRate 0.0315 Epoch: 17 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:08,221-Speed 10964.69 samples/sec Loss 6.7872 LearningRate 0.0315 Epoch: 17 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:09,121-Speed 11378.83 samples/sec Loss 6.5734 LearningRate 0.0315 Epoch: 17 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:10,008-Speed 11558.71 samples/sec Loss 6.6730 LearningRate 0.0314 Epoch: 17 Global Step: 99900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:10,908-Speed 11387.08 samples/sec Loss 6.6016 LearningRate 0.0314 Epoch: 17 Global Step: 99910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:11,813-Speed 11319.50 samples/sec Loss 6.5668 LearningRate 0.0314 Epoch: 17 Global Step: 99920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:12,708-Speed 11453.03 samples/sec Loss 6.7011 LearningRate 0.0314 Epoch: 17 Global Step: 99930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:13,591-Speed 11610.39 samples/sec Loss 6.6092 LearningRate 0.0314 Epoch: 17 Global Step: 99940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:14,470-Speed 11649.99 samples/sec Loss 6.6775 LearningRate 0.0314 Epoch: 17 Global Step: 99950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:15,416-Speed 10827.16 samples/sec Loss 6.5174 LearningRate 0.0314 Epoch: 17 Global Step: 99960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:47:16,339-Speed 11110.89 samples/sec Loss 6.6102 LearningRate 0.0314 Epoch: 17 Global Step: 99970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:17,289-Speed 10776.79 samples/sec Loss 6.6286 LearningRate 0.0314 Epoch: 17 Global Step: 99980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:18,222-Speed 10981.55 samples/sec Loss 6.7009 LearningRate 0.0314 Epoch: 17 Global Step: 99990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:19,122-Speed 11395.39 samples/sec Loss 6.4704 LearningRate 0.0314 Epoch: 17 Global Step: 100000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:47:40,440-[lfw][100000]XNorm: 11.109147 Training: 2022-04-11 02:47:40,440-[lfw][100000]Accuracy-Flip: 0.99483+-0.00293 Training: 2022-04-11 02:47:40,441-[lfw][100000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:48:05,106-[cfp_fp][100000]XNorm: 9.274375 Training: 2022-04-11 02:48:05,107-[cfp_fp][100000]Accuracy-Flip: 0.94200+-0.01132 Training: 2022-04-11 02:48:05,107-[cfp_fp][100000]Accuracy-Highest: 0.94443 Training: 2022-04-11 02:48:26,320-[agedb_30][100000]XNorm: 10.862005 Training: 2022-04-11 02:48:26,320-[agedb_30][100000]Accuracy-Flip: 0.96167+-0.01043 Training: 2022-04-11 02:48:26,321-[agedb_30][100000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:48:27,221-Speed 150.37 samples/sec Loss 6.8235 LearningRate 0.0314 Epoch: 17 Global Step: 100010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:28,097-Speed 11691.34 samples/sec Loss 6.5842 LearningRate 0.0314 Epoch: 17 Global Step: 100020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:29,028-Speed 11012.15 samples/sec Loss 6.5859 LearningRate 0.0314 Epoch: 17 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:29,945-Speed 11168.59 samples/sec Loss 6.6232 LearningRate 0.0314 Epoch: 17 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:30,840-Speed 11458.72 samples/sec Loss 6.5996 LearningRate 0.0314 Epoch: 17 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:31,760-Speed 11133.30 samples/sec Loss 6.6256 LearningRate 0.0314 Epoch: 17 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:32,654-Speed 11464.47 samples/sec Loss 6.6076 LearningRate 0.0314 Epoch: 17 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:33,530-Speed 11696.66 samples/sec Loss 6.7280 LearningRate 0.0314 Epoch: 17 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:34,423-Speed 11473.76 samples/sec Loss 6.6111 LearningRate 0.0314 Epoch: 17 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:35,318-Speed 11453.98 samples/sec Loss 6.7613 LearningRate 0.0313 Epoch: 17 Global Step: 100100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:36,217-Speed 11394.52 samples/sec Loss 6.6270 LearningRate 0.0313 Epoch: 17 Global Step: 100110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:37,116-Speed 11399.96 samples/sec Loss 6.5749 LearningRate 0.0313 Epoch: 17 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:37,999-Speed 11598.62 samples/sec Loss 6.5586 LearningRate 0.0313 Epoch: 17 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:38,887-Speed 11541.08 samples/sec Loss 6.6733 LearningRate 0.0313 Epoch: 17 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:39,791-Speed 11344.22 samples/sec Loss 6.6858 LearningRate 0.0313 Epoch: 17 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:40,664-Speed 11733.06 samples/sec Loss 6.5846 LearningRate 0.0313 Epoch: 17 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:41,572-Speed 11278.64 samples/sec Loss 6.6357 LearningRate 0.0313 Epoch: 17 Global Step: 100170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:42,472-Speed 11390.52 samples/sec Loss 6.5743 LearningRate 0.0313 Epoch: 17 Global Step: 100180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:43,426-Speed 10739.76 samples/sec Loss 6.6616 LearningRate 0.0313 Epoch: 17 Global Step: 100190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:44,353-Speed 11049.15 samples/sec Loss 6.6739 LearningRate 0.0313 Epoch: 17 Global Step: 100200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:45,245-Speed 11494.79 samples/sec Loss 6.5800 LearningRate 0.0313 Epoch: 17 Global Step: 100210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:46,122-Speed 11687.67 samples/sec Loss 6.6560 LearningRate 0.0313 Epoch: 17 Global Step: 100220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:47,022-Speed 11386.57 samples/sec Loss 6.7500 LearningRate 0.0313 Epoch: 17 Global Step: 100230 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:47,920-Speed 11417.94 samples/sec Loss 6.5204 LearningRate 0.0313 Epoch: 17 Global Step: 100240 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:48,840-Speed 11133.09 samples/sec Loss 6.7731 LearningRate 0.0313 Epoch: 17 Global Step: 100250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:49,752-Speed 11241.37 samples/sec Loss 6.5621 LearningRate 0.0313 Epoch: 17 Global Step: 100260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:48:50,704-Speed 10765.73 samples/sec Loss 6.6445 LearningRate 0.0313 Epoch: 17 Global Step: 100270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:51,641-Speed 10935.11 samples/sec Loss 6.4817 LearningRate 0.0313 Epoch: 17 Global Step: 100280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:52,594-Speed 10747.17 samples/sec Loss 6.7495 LearningRate 0.0313 Epoch: 17 Global Step: 100290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:53,498-Speed 11331.38 samples/sec Loss 6.7960 LearningRate 0.0312 Epoch: 17 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:54,396-Speed 11415.21 samples/sec Loss 6.7323 LearningRate 0.0312 Epoch: 17 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:55,347-Speed 10772.87 samples/sec Loss 6.5406 LearningRate 0.0312 Epoch: 17 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:56,228-Speed 11630.88 samples/sec Loss 6.6432 LearningRate 0.0312 Epoch: 17 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:57,107-Speed 11663.09 samples/sec Loss 6.7928 LearningRate 0.0312 Epoch: 17 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:57,976-Speed 11784.65 samples/sec Loss 6.6514 LearningRate 0.0312 Epoch: 17 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:58,896-Speed 11145.79 samples/sec Loss 6.7102 LearningRate 0.0312 Epoch: 17 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:48:59,789-Speed 11469.63 samples/sec Loss 6.6196 LearningRate 0.0312 Epoch: 17 Global Step: 100370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:00,749-Speed 10679.75 samples/sec Loss 6.7446 LearningRate 0.0312 Epoch: 17 Global Step: 100380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:01,638-Speed 11527.10 samples/sec Loss 6.7675 LearningRate 0.0312 Epoch: 17 Global Step: 100390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:02,578-Speed 10899.26 samples/sec Loss 6.7172 LearningRate 0.0312 Epoch: 17 Global Step: 100400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:03,518-Speed 10901.29 samples/sec Loss 6.9035 LearningRate 0.0312 Epoch: 17 Global Step: 100410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:04,446-Speed 11039.64 samples/sec Loss 6.5881 LearningRate 0.0312 Epoch: 17 Global Step: 100420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:05,374-Speed 11039.42 samples/sec Loss 6.7354 LearningRate 0.0312 Epoch: 17 Global Step: 100430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:06,299-Speed 11083.18 samples/sec Loss 6.6250 LearningRate 0.0312 Epoch: 17 Global Step: 100440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:07,182-Speed 11606.46 samples/sec Loss 6.7796 LearningRate 0.0312 Epoch: 17 Global Step: 100450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:08,059-Speed 11683.09 samples/sec Loss 6.6551 LearningRate 0.0312 Epoch: 17 Global Step: 100460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:08,960-Speed 11365.14 samples/sec Loss 6.6651 LearningRate 0.0312 Epoch: 17 Global Step: 100470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 02:49:09,825-Speed 11859.80 samples/sec Loss 6.7266 LearningRate 0.0312 Epoch: 17 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:10,706-Speed 11625.64 samples/sec Loss 6.6763 LearningRate 0.0312 Epoch: 17 Global Step: 100490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:11,600-Speed 11467.11 samples/sec Loss 6.7094 LearningRate 0.0312 Epoch: 17 Global Step: 100500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:12,499-Speed 11404.73 samples/sec Loss 6.5463 LearningRate 0.0311 Epoch: 17 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:13,429-Speed 11011.09 samples/sec Loss 6.6495 LearningRate 0.0311 Epoch: 17 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:14,365-Speed 10950.74 samples/sec Loss 6.6996 LearningRate 0.0311 Epoch: 17 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:15,265-Speed 11383.31 samples/sec Loss 6.6089 LearningRate 0.0311 Epoch: 17 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:16,154-Speed 11529.26 samples/sec Loss 6.6032 LearningRate 0.0311 Epoch: 17 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:17,042-Speed 11538.92 samples/sec Loss 6.6044 LearningRate 0.0311 Epoch: 17 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 02:49:17,943-Speed 11382.05 samples/sec Loss 6.6509 LearningRate 0.0311 Epoch: 17 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:18,887-Speed 10844.86 samples/sec Loss 6.6273 LearningRate 0.0311 Epoch: 17 Global Step: 100580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:19,801-Speed 11212.43 samples/sec Loss 6.5856 LearningRate 0.0311 Epoch: 17 Global Step: 100590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:20,679-Speed 11670.74 samples/sec Loss 6.5471 LearningRate 0.0311 Epoch: 17 Global Step: 100600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:21,561-Speed 11627.44 samples/sec Loss 6.7133 LearningRate 0.0311 Epoch: 17 Global Step: 100610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:22,445-Speed 11590.22 samples/sec Loss 6.6445 LearningRate 0.0311 Epoch: 17 Global Step: 100620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:23,352-Speed 11292.20 samples/sec Loss 6.6162 LearningRate 0.0311 Epoch: 17 Global Step: 100630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:24,262-Speed 11259.63 samples/sec Loss 6.6362 LearningRate 0.0311 Epoch: 17 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:25,174-Speed 11240.34 samples/sec Loss 6.7485 LearningRate 0.0311 Epoch: 17 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:26,072-Speed 11413.70 samples/sec Loss 6.6174 LearningRate 0.0311 Epoch: 17 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:26,951-Speed 11661.20 samples/sec Loss 6.5528 LearningRate 0.0311 Epoch: 17 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:27,858-Speed 11295.98 samples/sec Loss 6.5615 LearningRate 0.0311 Epoch: 17 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:28,785-Speed 11050.92 samples/sec Loss 6.5121 LearningRate 0.0311 Epoch: 17 Global Step: 100690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:29,679-Speed 11459.14 samples/sec Loss 6.5113 LearningRate 0.0311 Epoch: 17 Global Step: 100700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:30,593-Speed 11208.52 samples/sec Loss 6.6448 LearningRate 0.0310 Epoch: 17 Global Step: 100710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:31,498-Speed 11324.07 samples/sec Loss 6.7068 LearningRate 0.0310 Epoch: 17 Global Step: 100720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:32,392-Speed 11464.07 samples/sec Loss 6.7526 LearningRate 0.0310 Epoch: 17 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:33,282-Speed 11524.46 samples/sec Loss 6.5017 LearningRate 0.0310 Epoch: 17 Global Step: 100740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:34,158-Speed 11691.56 samples/sec Loss 6.6083 LearningRate 0.0310 Epoch: 17 Global Step: 100750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:35,039-Speed 11635.65 samples/sec Loss 6.7917 LearningRate 0.0310 Epoch: 17 Global Step: 100760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:35,952-Speed 11218.63 samples/sec Loss 6.7233 LearningRate 0.0310 Epoch: 17 Global Step: 100770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:36,852-Speed 11382.46 samples/sec Loss 6.7794 LearningRate 0.0310 Epoch: 17 Global Step: 100780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:37,717-Speed 11853.11 samples/sec Loss 6.7363 LearningRate 0.0310 Epoch: 17 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:38,610-Speed 11475.32 samples/sec Loss 6.5924 LearningRate 0.0310 Epoch: 17 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:39,535-Speed 11074.57 samples/sec Loss 6.6145 LearningRate 0.0310 Epoch: 17 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:40,447-Speed 11235.95 samples/sec Loss 6.5659 LearningRate 0.0310 Epoch: 17 Global Step: 100820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:41,374-Speed 11062.39 samples/sec Loss 6.6089 LearningRate 0.0310 Epoch: 17 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:42,267-Speed 11473.79 samples/sec Loss 6.7010 LearningRate 0.0310 Epoch: 17 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:43,154-Speed 11553.14 samples/sec Loss 6.7479 LearningRate 0.0310 Epoch: 17 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:44,106-Speed 10753.76 samples/sec Loss 6.5390 LearningRate 0.0310 Epoch: 17 Global Step: 100860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:44,985-Speed 11663.87 samples/sec Loss 6.6659 LearningRate 0.0310 Epoch: 17 Global Step: 100870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:45,862-Speed 11687.49 samples/sec Loss 6.6731 LearningRate 0.0310 Epoch: 17 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:46,768-Speed 11301.67 samples/sec Loss 6.6356 LearningRate 0.0310 Epoch: 17 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:47,687-Speed 11161.47 samples/sec Loss 6.6634 LearningRate 0.0310 Epoch: 17 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:48,627-Speed 10893.41 samples/sec Loss 6.6132 LearningRate 0.0309 Epoch: 17 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:49,591-Speed 10635.64 samples/sec Loss 6.6630 LearningRate 0.0309 Epoch: 17 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:50,554-Speed 10632.60 samples/sec Loss 6.5706 LearningRate 0.0309 Epoch: 17 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:51,487-Speed 10990.54 samples/sec Loss 6.6516 LearningRate 0.0309 Epoch: 17 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:52,397-Speed 11256.70 samples/sec Loss 6.7789 LearningRate 0.0309 Epoch: 17 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:53,329-Speed 10991.10 samples/sec Loss 6.6505 LearningRate 0.0309 Epoch: 17 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:54,256-Speed 11065.03 samples/sec Loss 6.6782 LearningRate 0.0309 Epoch: 17 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:49:55,156-Speed 11393.59 samples/sec Loss 6.5321 LearningRate 0.0309 Epoch: 17 Global Step: 100980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:56,030-Speed 11723.19 samples/sec Loss 6.5787 LearningRate 0.0309 Epoch: 17 Global Step: 100990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:56,913-Speed 11615.22 samples/sec Loss 6.5791 LearningRate 0.0309 Epoch: 17 Global Step: 101000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:57,779-Speed 11824.96 samples/sec Loss 6.6160 LearningRate 0.0309 Epoch: 17 Global Step: 101010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:58,652-Speed 11741.92 samples/sec Loss 6.5827 LearningRate 0.0309 Epoch: 17 Global Step: 101020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:49:59,535-Speed 11598.69 samples/sec Loss 6.5940 LearningRate 0.0309 Epoch: 17 Global Step: 101030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:00,425-Speed 11517.77 samples/sec Loss 6.6585 LearningRate 0.0309 Epoch: 17 Global Step: 101040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:01,322-Speed 11431.02 samples/sec Loss 6.7896 LearningRate 0.0309 Epoch: 17 Global Step: 101050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:02,205-Speed 11599.16 samples/sec Loss 6.6027 LearningRate 0.0309 Epoch: 17 Global Step: 101060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:03,131-Speed 11066.47 samples/sec Loss 6.5833 LearningRate 0.0309 Epoch: 17 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:04,010-Speed 11663.66 samples/sec Loss 6.5479 LearningRate 0.0309 Epoch: 17 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:04,902-Speed 11477.58 samples/sec Loss 6.6503 LearningRate 0.0309 Epoch: 17 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:05,779-Speed 11685.48 samples/sec Loss 6.6809 LearningRate 0.0309 Epoch: 17 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:06,678-Speed 11402.31 samples/sec Loss 6.6663 LearningRate 0.0309 Epoch: 17 Global Step: 101110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:07,577-Speed 11400.67 samples/sec Loss 6.7019 LearningRate 0.0308 Epoch: 17 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:08,489-Speed 11233.77 samples/sec Loss 6.6633 LearningRate 0.0308 Epoch: 17 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:09,380-Speed 11509.76 samples/sec Loss 6.5449 LearningRate 0.0308 Epoch: 17 Global Step: 101140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:10,267-Speed 11558.99 samples/sec Loss 6.7031 LearningRate 0.0308 Epoch: 17 Global Step: 101150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:11,180-Speed 11224.16 samples/sec Loss 6.6440 LearningRate 0.0308 Epoch: 17 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:12,057-Speed 11685.07 samples/sec Loss 6.5600 LearningRate 0.0308 Epoch: 17 Global Step: 101170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:12,964-Speed 11297.90 samples/sec Loss 6.5496 LearningRate 0.0308 Epoch: 17 Global Step: 101180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:13,857-Speed 11469.02 samples/sec Loss 6.6287 LearningRate 0.0308 Epoch: 17 Global Step: 101190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:14,745-Speed 11538.19 samples/sec Loss 6.6733 LearningRate 0.0308 Epoch: 17 Global Step: 101200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:15,615-Speed 11793.34 samples/sec Loss 6.5494 LearningRate 0.0308 Epoch: 17 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:16,498-Speed 11599.87 samples/sec Loss 6.5810 LearningRate 0.0308 Epoch: 17 Global Step: 101220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:17,437-Speed 10915.99 samples/sec Loss 6.5070 LearningRate 0.0308 Epoch: 17 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:18,345-Speed 11284.35 samples/sec Loss 6.7335 LearningRate 0.0308 Epoch: 17 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:19,267-Speed 11114.69 samples/sec Loss 6.6015 LearningRate 0.0308 Epoch: 17 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:20,177-Speed 11262.67 samples/sec Loss 6.6998 LearningRate 0.0308 Epoch: 17 Global Step: 101260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:21,076-Speed 11395.44 samples/sec Loss 6.5081 LearningRate 0.0308 Epoch: 17 Global Step: 101270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:21,983-Speed 11295.55 samples/sec Loss 6.6351 LearningRate 0.0308 Epoch: 17 Global Step: 101280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:22,893-Speed 11259.55 samples/sec Loss 6.6456 LearningRate 0.0308 Epoch: 17 Global Step: 101290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:23,786-Speed 11475.29 samples/sec Loss 6.6573 LearningRate 0.0308 Epoch: 17 Global Step: 101300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:24,667-Speed 11632.03 samples/sec Loss 6.6126 LearningRate 0.0308 Epoch: 17 Global Step: 101310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:25,574-Speed 11305.29 samples/sec Loss 6.6015 LearningRate 0.0307 Epoch: 17 Global Step: 101320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:26,473-Speed 11389.31 samples/sec Loss 6.7070 LearningRate 0.0307 Epoch: 17 Global Step: 101330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:27,403-Speed 11023.61 samples/sec Loss 6.5457 LearningRate 0.0307 Epoch: 17 Global Step: 101340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:28,297-Speed 11463.11 samples/sec Loss 6.6260 LearningRate 0.0307 Epoch: 17 Global Step: 101350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:29,214-Speed 11175.33 samples/sec Loss 6.4997 LearningRate 0.0307 Epoch: 17 Global Step: 101360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:30,102-Speed 11536.90 samples/sec Loss 6.6092 LearningRate 0.0307 Epoch: 17 Global Step: 101370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:30,979-Speed 11685.07 samples/sec Loss 6.5535 LearningRate 0.0307 Epoch: 17 Global Step: 101380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:31,889-Speed 11264.95 samples/sec Loss 6.7211 LearningRate 0.0307 Epoch: 17 Global Step: 101390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:32,787-Speed 11406.43 samples/sec Loss 6.6632 LearningRate 0.0307 Epoch: 17 Global Step: 101400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:33,664-Speed 11679.57 samples/sec Loss 6.6626 LearningRate 0.0307 Epoch: 17 Global Step: 101410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:34,552-Speed 11544.86 samples/sec Loss 6.6684 LearningRate 0.0307 Epoch: 17 Global Step: 101420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:35,445-Speed 11480.40 samples/sec Loss 6.5356 LearningRate 0.0307 Epoch: 17 Global Step: 101430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:36,356-Speed 11244.65 samples/sec Loss 6.5771 LearningRate 0.0307 Epoch: 17 Global Step: 101440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:37,237-Speed 11638.33 samples/sec Loss 6.6687 LearningRate 0.0307 Epoch: 17 Global Step: 101450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:38,115-Speed 11666.36 samples/sec Loss 6.6560 LearningRate 0.0307 Epoch: 17 Global Step: 101460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:39,022-Speed 11294.36 samples/sec Loss 6.5382 LearningRate 0.0307 Epoch: 17 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:39,963-Speed 10896.56 samples/sec Loss 6.7249 LearningRate 0.0307 Epoch: 17 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:40,890-Speed 11055.44 samples/sec Loss 6.7170 LearningRate 0.0307 Epoch: 17 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:41,786-Speed 11439.20 samples/sec Loss 6.6987 LearningRate 0.0307 Epoch: 17 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:42,712-Speed 11064.35 samples/sec Loss 6.7098 LearningRate 0.0307 Epoch: 17 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:43,594-Speed 11615.28 samples/sec Loss 6.6605 LearningRate 0.0307 Epoch: 17 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:44,488-Speed 11459.79 samples/sec Loss 6.7553 LearningRate 0.0306 Epoch: 17 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:45,386-Speed 11409.86 samples/sec Loss 6.6095 LearningRate 0.0306 Epoch: 17 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:46,296-Speed 11269.37 samples/sec Loss 6.6874 LearningRate 0.0306 Epoch: 17 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:47,190-Speed 11452.40 samples/sec Loss 6.7660 LearningRate 0.0306 Epoch: 17 Global Step: 101560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:48,058-Speed 11812.84 samples/sec Loss 6.4691 LearningRate 0.0306 Epoch: 17 Global Step: 101570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:48,985-Speed 11049.91 samples/sec Loss 6.6614 LearningRate 0.0306 Epoch: 17 Global Step: 101580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:49,877-Speed 11494.95 samples/sec Loss 6.5607 LearningRate 0.0306 Epoch: 17 Global Step: 101590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:50,796-Speed 11143.58 samples/sec Loss 6.5083 LearningRate 0.0306 Epoch: 17 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:51,699-Speed 11352.15 samples/sec Loss 6.5849 LearningRate 0.0306 Epoch: 17 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:52,584-Speed 11571.38 samples/sec Loss 6.6702 LearningRate 0.0306 Epoch: 17 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:53,471-Speed 11557.33 samples/sec Loss 6.6518 LearningRate 0.0306 Epoch: 17 Global Step: 101630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:54,356-Speed 11577.27 samples/sec Loss 6.6022 LearningRate 0.0306 Epoch: 17 Global Step: 101640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:55,274-Speed 11168.31 samples/sec Loss 6.6088 LearningRate 0.0306 Epoch: 17 Global Step: 101650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:56,194-Speed 11131.07 samples/sec Loss 6.6764 LearningRate 0.0306 Epoch: 17 Global Step: 101660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:50:57,092-Speed 11416.81 samples/sec Loss 6.7479 LearningRate 0.0306 Epoch: 17 Global Step: 101670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:57,995-Speed 11337.22 samples/sec Loss 6.4380 LearningRate 0.0306 Epoch: 17 Global Step: 101680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:58,885-Speed 11517.61 samples/sec Loss 6.6697 LearningRate 0.0306 Epoch: 17 Global Step: 101690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:50:59,825-Speed 10900.79 samples/sec Loss 6.6015 LearningRate 0.0306 Epoch: 17 Global Step: 101700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:00,744-Speed 11152.67 samples/sec Loss 6.5999 LearningRate 0.0306 Epoch: 17 Global Step: 101710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:01,691-Speed 10823.94 samples/sec Loss 6.7448 LearningRate 0.0306 Epoch: 17 Global Step: 101720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:02,632-Speed 10901.34 samples/sec Loss 6.6455 LearningRate 0.0305 Epoch: 17 Global Step: 101730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:03,590-Speed 10688.49 samples/sec Loss 6.5933 LearningRate 0.0305 Epoch: 17 Global Step: 101740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:04,484-Speed 11464.82 samples/sec Loss 6.7655 LearningRate 0.0305 Epoch: 17 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:05,413-Speed 11035.81 samples/sec Loss 6.4881 LearningRate 0.0305 Epoch: 17 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:06,345-Speed 10997.47 samples/sec Loss 6.6700 LearningRate 0.0305 Epoch: 17 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:07,308-Speed 10637.59 samples/sec Loss 6.7804 LearningRate 0.0305 Epoch: 17 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:08,245-Speed 10932.65 samples/sec Loss 6.7234 LearningRate 0.0305 Epoch: 17 Global Step: 101790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:09,127-Speed 11623.84 samples/sec Loss 6.7607 LearningRate 0.0305 Epoch: 17 Global Step: 101800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:10,064-Speed 10926.53 samples/sec Loss 6.5156 LearningRate 0.0305 Epoch: 17 Global Step: 101810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:10,946-Speed 11623.73 samples/sec Loss 6.6818 LearningRate 0.0305 Epoch: 17 Global Step: 101820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:11,848-Speed 11360.07 samples/sec Loss 6.7372 LearningRate 0.0305 Epoch: 17 Global Step: 101830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:12,750-Speed 11364.70 samples/sec Loss 6.5798 LearningRate 0.0305 Epoch: 17 Global Step: 101840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:13,621-Speed 11769.89 samples/sec Loss 6.6819 LearningRate 0.0305 Epoch: 17 Global Step: 101850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:14,567-Speed 10828.83 samples/sec Loss 6.6107 LearningRate 0.0305 Epoch: 17 Global Step: 101860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:15,452-Speed 11579.82 samples/sec Loss 6.6277 LearningRate 0.0305 Epoch: 17 Global Step: 101870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:16,348-Speed 11429.97 samples/sec Loss 6.4034 LearningRate 0.0305 Epoch: 17 Global Step: 101880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:17,264-Speed 11191.64 samples/sec Loss 6.6257 LearningRate 0.0305 Epoch: 17 Global Step: 101890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:18,146-Speed 11613.08 samples/sec Loss 6.7320 LearningRate 0.0305 Epoch: 17 Global Step: 101900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:19,067-Speed 11138.30 samples/sec Loss 6.6515 LearningRate 0.0305 Epoch: 17 Global Step: 101910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:19,945-Speed 11669.15 samples/sec Loss 6.7038 LearningRate 0.0305 Epoch: 17 Global Step: 101920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:20,819-Speed 11733.22 samples/sec Loss 6.6160 LearningRate 0.0305 Epoch: 17 Global Step: 101930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:21,691-Speed 11750.06 samples/sec Loss 6.6177 LearningRate 0.0304 Epoch: 17 Global Step: 101940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:22,589-Speed 11417.43 samples/sec Loss 6.7207 LearningRate 0.0304 Epoch: 17 Global Step: 101950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:23,495-Speed 11302.20 samples/sec Loss 6.6844 LearningRate 0.0304 Epoch: 17 Global Step: 101960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:24,425-Speed 11016.64 samples/sec Loss 6.6901 LearningRate 0.0304 Epoch: 17 Global Step: 101970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:25,364-Speed 10915.78 samples/sec Loss 6.5921 LearningRate 0.0304 Epoch: 17 Global Step: 101980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:51:26,297-Speed 10987.13 samples/sec Loss 6.6538 LearningRate 0.0304 Epoch: 17 Global Step: 101990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:27,247-Speed 10784.03 samples/sec Loss 6.7119 LearningRate 0.0304 Epoch: 17 Global Step: 102000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:51:48,464-[lfw][102000]XNorm: 10.821524 Training: 2022-04-11 02:51:48,464-[lfw][102000]Accuracy-Flip: 0.99517+-0.00320 Training: 2022-04-11 02:51:48,465-[lfw][102000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:52:13,064-[cfp_fp][102000]XNorm: 9.065311 Training: 2022-04-11 02:52:13,064-[cfp_fp][102000]Accuracy-Flip: 0.94600+-0.01190 Training: 2022-04-11 02:52:13,064-[cfp_fp][102000]Accuracy-Highest: 0.94600 Training: 2022-04-11 02:52:34,244-[agedb_30][102000]XNorm: 10.623224 Training: 2022-04-11 02:52:34,245-[agedb_30][102000]Accuracy-Flip: 0.96217+-0.01126 Training: 2022-04-11 02:52:34,245-[agedb_30][102000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:52:35,138-Speed 150.83 samples/sec Loss 6.4363 LearningRate 0.0304 Epoch: 17 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:36,044-Speed 11311.61 samples/sec Loss 6.5138 LearningRate 0.0304 Epoch: 17 Global Step: 102020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:36,943-Speed 11402.86 samples/sec Loss 6.6935 LearningRate 0.0304 Epoch: 17 Global Step: 102030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:37,851-Speed 11287.38 samples/sec Loss 6.7420 LearningRate 0.0304 Epoch: 17 Global Step: 102040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:38,743-Speed 11483.58 samples/sec Loss 6.4696 LearningRate 0.0304 Epoch: 17 Global Step: 102050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:39,633-Speed 11516.64 samples/sec Loss 6.7750 LearningRate 0.0304 Epoch: 17 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:40,521-Speed 11535.75 samples/sec Loss 6.5431 LearningRate 0.0304 Epoch: 17 Global Step: 102070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:41,438-Speed 11179.46 samples/sec Loss 6.6202 LearningRate 0.0304 Epoch: 17 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:42,324-Speed 11559.96 samples/sec Loss 6.7008 LearningRate 0.0304 Epoch: 17 Global Step: 102090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:52:43,196-Speed 11746.98 samples/sec Loss 6.6393 LearningRate 0.0304 Epoch: 17 Global Step: 102100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:52:44,072-Speed 11707.31 samples/sec Loss 6.5238 LearningRate 0.0304 Epoch: 17 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:44,962-Speed 11509.10 samples/sec Loss 6.5509 LearningRate 0.0304 Epoch: 17 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:45,844-Speed 11613.87 samples/sec Loss 6.6294 LearningRate 0.0304 Epoch: 17 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:46,722-Speed 11679.30 samples/sec Loss 6.7545 LearningRate 0.0304 Epoch: 17 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:47,619-Speed 11414.29 samples/sec Loss 6.5141 LearningRate 0.0303 Epoch: 17 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:48,499-Speed 11656.97 samples/sec Loss 6.6867 LearningRate 0.0303 Epoch: 17 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:49,379-Speed 11644.98 samples/sec Loss 6.5872 LearningRate 0.0303 Epoch: 17 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:50,283-Speed 11329.80 samples/sec Loss 6.5132 LearningRate 0.0303 Epoch: 17 Global Step: 102180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:51,164-Speed 11625.96 samples/sec Loss 6.8338 LearningRate 0.0303 Epoch: 17 Global Step: 102190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:52,061-Speed 11432.90 samples/sec Loss 6.6200 LearningRate 0.0303 Epoch: 17 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:52,944-Speed 11596.95 samples/sec Loss 6.5430 LearningRate 0.0303 Epoch: 17 Global Step: 102210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:52:53,849-Speed 11328.13 samples/sec Loss 6.6847 LearningRate 0.0303 Epoch: 17 Global Step: 102220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:52:54,721-Speed 11746.21 samples/sec Loss 6.5575 LearningRate 0.0303 Epoch: 17 Global Step: 102230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:52:55,656-Speed 10964.57 samples/sec Loss 6.6713 LearningRate 0.0303 Epoch: 17 Global Step: 102240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:52:56,549-Speed 11492.71 samples/sec Loss 6.6635 LearningRate 0.0303 Epoch: 17 Global Step: 102250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:57,426-Speed 11688.47 samples/sec Loss 6.6161 LearningRate 0.0303 Epoch: 17 Global Step: 102260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:58,316-Speed 11506.25 samples/sec Loss 6.6231 LearningRate 0.0303 Epoch: 17 Global Step: 102270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:52:59,207-Speed 11511.90 samples/sec Loss 6.6250 LearningRate 0.0303 Epoch: 17 Global Step: 102280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:00,108-Speed 11371.54 samples/sec Loss 6.6321 LearningRate 0.0303 Epoch: 17 Global Step: 102290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:01,022-Speed 11210.84 samples/sec Loss 6.6074 LearningRate 0.0303 Epoch: 17 Global Step: 102300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:01,902-Speed 11647.92 samples/sec Loss 6.5482 LearningRate 0.0303 Epoch: 17 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:02,781-Speed 11654.28 samples/sec Loss 6.6302 LearningRate 0.0303 Epoch: 17 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:03,678-Speed 11421.00 samples/sec Loss 6.7323 LearningRate 0.0303 Epoch: 17 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:04,675-Speed 10274.70 samples/sec Loss 6.7488 LearningRate 0.0303 Epoch: 17 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:16,068-Speed 898.91 samples/sec Loss 6.3792 LearningRate 0.0302 Epoch: 18 Global Step: 102350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:17,017-Speed 10819.45 samples/sec Loss 5.9384 LearningRate 0.0302 Epoch: 18 Global Step: 102360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:17,959-Speed 10879.82 samples/sec Loss 5.9541 LearningRate 0.0302 Epoch: 18 Global Step: 102370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:18,870-Speed 11237.04 samples/sec Loss 5.8608 LearningRate 0.0302 Epoch: 18 Global Step: 102380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:20,009-Speed 8999.68 samples/sec Loss 6.0139 LearningRate 0.0302 Epoch: 18 Global Step: 102390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:20,914-Speed 11326.22 samples/sec Loss 5.9209 LearningRate 0.0302 Epoch: 18 Global Step: 102400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:21,834-Speed 11129.44 samples/sec Loss 5.8473 LearningRate 0.0302 Epoch: 18 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:22,755-Speed 11133.19 samples/sec Loss 5.9181 LearningRate 0.0302 Epoch: 18 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:23,664-Speed 11268.58 samples/sec Loss 5.9691 LearningRate 0.0302 Epoch: 18 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:24,626-Speed 10658.30 samples/sec Loss 5.9218 LearningRate 0.0302 Epoch: 18 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:25,571-Speed 10841.30 samples/sec Loss 5.8927 LearningRate 0.0302 Epoch: 18 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:26,529-Speed 10692.94 samples/sec Loss 5.9978 LearningRate 0.0302 Epoch: 18 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:27,517-Speed 10372.21 samples/sec Loss 6.0624 LearningRate 0.0302 Epoch: 18 Global Step: 102470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:28,455-Speed 10931.06 samples/sec Loss 6.0171 LearningRate 0.0302 Epoch: 18 Global Step: 102480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:29,382-Speed 11053.14 samples/sec Loss 5.9051 LearningRate 0.0302 Epoch: 18 Global Step: 102490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:30,341-Speed 10685.13 samples/sec Loss 5.9639 LearningRate 0.0302 Epoch: 18 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:31,311-Speed 10562.12 samples/sec Loss 5.9669 LearningRate 0.0302 Epoch: 18 Global Step: 102510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:32,204-Speed 11475.34 samples/sec Loss 6.0223 LearningRate 0.0302 Epoch: 18 Global Step: 102520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:33,092-Speed 11542.93 samples/sec Loss 6.0605 LearningRate 0.0302 Epoch: 18 Global Step: 102530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:33,970-Speed 11666.45 samples/sec Loss 5.8595 LearningRate 0.0302 Epoch: 18 Global Step: 102540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:35,079-Speed 9242.16 samples/sec Loss 5.9913 LearningRate 0.0302 Epoch: 18 Global Step: 102550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:35,968-Speed 11522.39 samples/sec Loss 6.0623 LearningRate 0.0301 Epoch: 18 Global Step: 102560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:36,933-Speed 10618.95 samples/sec Loss 5.9262 LearningRate 0.0301 Epoch: 18 Global Step: 102570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:37,940-Speed 10182.18 samples/sec Loss 5.9524 LearningRate 0.0301 Epoch: 18 Global Step: 102580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:38,840-Speed 11377.85 samples/sec Loss 6.0673 LearningRate 0.0301 Epoch: 18 Global Step: 102590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:39,747-Speed 11298.33 samples/sec Loss 5.8969 LearningRate 0.0301 Epoch: 18 Global Step: 102600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:40,695-Speed 10814.17 samples/sec Loss 5.9269 LearningRate 0.0301 Epoch: 18 Global Step: 102610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:41,608-Speed 11216.55 samples/sec Loss 6.1430 LearningRate 0.0301 Epoch: 18 Global Step: 102620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:42,516-Speed 11284.84 samples/sec Loss 6.0213 LearningRate 0.0301 Epoch: 18 Global Step: 102630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:43,442-Speed 11072.38 samples/sec Loss 5.9768 LearningRate 0.0301 Epoch: 18 Global Step: 102640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:44,322-Speed 11634.38 samples/sec Loss 5.9565 LearningRate 0.0301 Epoch: 18 Global Step: 102650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:45,203-Speed 11637.32 samples/sec Loss 5.9943 LearningRate 0.0301 Epoch: 18 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:46,090-Speed 11547.89 samples/sec Loss 6.0842 LearningRate 0.0301 Epoch: 18 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:46,988-Speed 11408.74 samples/sec Loss 6.1551 LearningRate 0.0301 Epoch: 18 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:47,945-Speed 10709.01 samples/sec Loss 5.8830 LearningRate 0.0301 Epoch: 18 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:48,866-Speed 11123.10 samples/sec Loss 5.9515 LearningRate 0.0301 Epoch: 18 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:49,795-Speed 11036.86 samples/sec Loss 6.0350 LearningRate 0.0301 Epoch: 18 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:50,729-Speed 10971.11 samples/sec Loss 6.1232 LearningRate 0.0301 Epoch: 18 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:51,643-Speed 11205.82 samples/sec Loss 6.1764 LearningRate 0.0301 Epoch: 18 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:52,598-Speed 10734.55 samples/sec Loss 6.0511 LearningRate 0.0301 Epoch: 18 Global Step: 102740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:53,519-Speed 11123.46 samples/sec Loss 6.1046 LearningRate 0.0301 Epoch: 18 Global Step: 102750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:54,441-Speed 11112.39 samples/sec Loss 6.1705 LearningRate 0.0301 Epoch: 18 Global Step: 102760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:55,342-Speed 11380.26 samples/sec Loss 6.1048 LearningRate 0.0300 Epoch: 18 Global Step: 102770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:56,272-Speed 11017.10 samples/sec Loss 6.2004 LearningRate 0.0300 Epoch: 18 Global Step: 102780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:57,208-Speed 10952.03 samples/sec Loss 6.0649 LearningRate 0.0300 Epoch: 18 Global Step: 102790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:58,096-Speed 11528.48 samples/sec Loss 6.1810 LearningRate 0.0300 Epoch: 18 Global Step: 102800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:53:58,987-Speed 11505.46 samples/sec Loss 6.2280 LearningRate 0.0300 Epoch: 18 Global Step: 102810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:53:59,860-Speed 11741.28 samples/sec Loss 6.1755 LearningRate 0.0300 Epoch: 18 Global Step: 102820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:00,753-Speed 11477.97 samples/sec Loss 6.2756 LearningRate 0.0300 Epoch: 18 Global Step: 102830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:01,656-Speed 11346.97 samples/sec Loss 6.2152 LearningRate 0.0300 Epoch: 18 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:02,565-Speed 11274.98 samples/sec Loss 6.0050 LearningRate 0.0300 Epoch: 18 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:03,530-Speed 10617.05 samples/sec Loss 6.0444 LearningRate 0.0300 Epoch: 18 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:04,479-Speed 10796.07 samples/sec Loss 6.2640 LearningRate 0.0300 Epoch: 18 Global Step: 102870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:05,427-Speed 10800.07 samples/sec Loss 6.1470 LearningRate 0.0300 Epoch: 18 Global Step: 102880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:06,362-Speed 10962.20 samples/sec Loss 6.2256 LearningRate 0.0300 Epoch: 18 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:07,279-Speed 11175.57 samples/sec Loss 6.2100 LearningRate 0.0300 Epoch: 18 Global Step: 102900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:08,207-Speed 11047.51 samples/sec Loss 6.0641 LearningRate 0.0300 Epoch: 18 Global Step: 102910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:09,087-Speed 11645.39 samples/sec Loss 6.1341 LearningRate 0.0300 Epoch: 18 Global Step: 102920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:09,968-Speed 11639.27 samples/sec Loss 6.1757 LearningRate 0.0300 Epoch: 18 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:10,855-Speed 11554.69 samples/sec Loss 6.0951 LearningRate 0.0300 Epoch: 18 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:11,778-Speed 11094.91 samples/sec Loss 6.1510 LearningRate 0.0300 Epoch: 18 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:12,704-Speed 11069.49 samples/sec Loss 6.1514 LearningRate 0.0300 Epoch: 18 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:13,653-Speed 10799.87 samples/sec Loss 6.1867 LearningRate 0.0299 Epoch: 18 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:14,604-Speed 10766.17 samples/sec Loss 6.2532 LearningRate 0.0299 Epoch: 18 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:15,526-Speed 11118.79 samples/sec Loss 6.3294 LearningRate 0.0299 Epoch: 18 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:16,474-Speed 10808.76 samples/sec Loss 6.1663 LearningRate 0.0299 Epoch: 18 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:17,425-Speed 10778.09 samples/sec Loss 6.0391 LearningRate 0.0299 Epoch: 18 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:18,337-Speed 11228.21 samples/sec Loss 6.2404 LearningRate 0.0299 Epoch: 18 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:19,269-Speed 11004.13 samples/sec Loss 6.1767 LearningRate 0.0299 Epoch: 18 Global Step: 103030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:20,185-Speed 11185.45 samples/sec Loss 6.3321 LearningRate 0.0299 Epoch: 18 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:21,097-Speed 11233.79 samples/sec Loss 6.1320 LearningRate 0.0299 Epoch: 18 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:21,991-Speed 11463.59 samples/sec Loss 6.2308 LearningRate 0.0299 Epoch: 18 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:22,886-Speed 11448.03 samples/sec Loss 6.1481 LearningRate 0.0299 Epoch: 18 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:23,783-Speed 11418.52 samples/sec Loss 6.1926 LearningRate 0.0299 Epoch: 18 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:24,711-Speed 11046.94 samples/sec Loss 6.1132 LearningRate 0.0299 Epoch: 18 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:25,649-Speed 10925.25 samples/sec Loss 6.3252 LearningRate 0.0299 Epoch: 18 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:26,545-Speed 11431.09 samples/sec Loss 6.2217 LearningRate 0.0299 Epoch: 18 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:27,451-Speed 11318.34 samples/sec Loss 6.2515 LearningRate 0.0299 Epoch: 18 Global Step: 103120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:28,352-Speed 11368.60 samples/sec Loss 6.3825 LearningRate 0.0299 Epoch: 18 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:29,266-Speed 11206.42 samples/sec Loss 6.2726 LearningRate 0.0299 Epoch: 18 Global Step: 103140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:30,190-Speed 11094.68 samples/sec Loss 6.1536 LearningRate 0.0299 Epoch: 18 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:31,125-Speed 10955.95 samples/sec Loss 6.3234 LearningRate 0.0299 Epoch: 18 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:32,029-Speed 11343.18 samples/sec Loss 6.1906 LearningRate 0.0299 Epoch: 18 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:32,918-Speed 11526.69 samples/sec Loss 6.2272 LearningRate 0.0298 Epoch: 18 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:33,794-Speed 11693.38 samples/sec Loss 6.1800 LearningRate 0.0298 Epoch: 18 Global Step: 103190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:34,675-Speed 11629.30 samples/sec Loss 6.2290 LearningRate 0.0298 Epoch: 18 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:35,583-Speed 11283.84 samples/sec Loss 6.1977 LearningRate 0.0298 Epoch: 18 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:36,475-Speed 11491.47 samples/sec Loss 6.2480 LearningRate 0.0298 Epoch: 18 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:37,437-Speed 10653.20 samples/sec Loss 6.2859 LearningRate 0.0298 Epoch: 18 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:38,372-Speed 10956.00 samples/sec Loss 6.2691 LearningRate 0.0298 Epoch: 18 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:39,306-Speed 10973.53 samples/sec Loss 6.2547 LearningRate 0.0298 Epoch: 18 Global Step: 103250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:40,237-Speed 10998.31 samples/sec Loss 6.1790 LearningRate 0.0298 Epoch: 18 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:41,143-Speed 11312.97 samples/sec Loss 6.2538 LearningRate 0.0298 Epoch: 18 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:42,083-Speed 10906.28 samples/sec Loss 6.2790 LearningRate 0.0298 Epoch: 18 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:43,039-Speed 10718.13 samples/sec Loss 6.2281 LearningRate 0.0298 Epoch: 18 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:43,967-Speed 11045.86 samples/sec Loss 6.4358 LearningRate 0.0298 Epoch: 18 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:44,877-Speed 11257.85 samples/sec Loss 6.2209 LearningRate 0.0298 Epoch: 18 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:45,775-Speed 11412.03 samples/sec Loss 6.2231 LearningRate 0.0298 Epoch: 18 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:46,668-Speed 11474.36 samples/sec Loss 6.2693 LearningRate 0.0298 Epoch: 18 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:47,558-Speed 11508.23 samples/sec Loss 6.3417 LearningRate 0.0298 Epoch: 18 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:48,449-Speed 11498.44 samples/sec Loss 6.3113 LearningRate 0.0298 Epoch: 18 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:49,378-Speed 11032.20 samples/sec Loss 6.3753 LearningRate 0.0298 Epoch: 18 Global Step: 103360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:50,301-Speed 11106.58 samples/sec Loss 6.2703 LearningRate 0.0298 Epoch: 18 Global Step: 103370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:51,193-Speed 11488.60 samples/sec Loss 6.4510 LearningRate 0.0298 Epoch: 18 Global Step: 103380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:52,134-Speed 10884.49 samples/sec Loss 6.3522 LearningRate 0.0297 Epoch: 18 Global Step: 103390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:54:53,051-Speed 11174.66 samples/sec Loss 6.4793 LearningRate 0.0297 Epoch: 18 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:53,975-Speed 11095.12 samples/sec Loss 6.3057 LearningRate 0.0297 Epoch: 18 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:54,886-Speed 11252.01 samples/sec Loss 6.1828 LearningRate 0.0297 Epoch: 18 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:55,774-Speed 11527.75 samples/sec Loss 6.2184 LearningRate 0.0297 Epoch: 18 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:56,664-Speed 11515.12 samples/sec Loss 6.2095 LearningRate 0.0297 Epoch: 18 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:57,551-Speed 11559.98 samples/sec Loss 6.4218 LearningRate 0.0297 Epoch: 18 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:58,437-Speed 11559.21 samples/sec Loss 6.2807 LearningRate 0.0297 Epoch: 18 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:54:59,333-Speed 11433.75 samples/sec Loss 6.3478 LearningRate 0.0297 Epoch: 18 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:00,239-Speed 11312.29 samples/sec Loss 6.3197 LearningRate 0.0297 Epoch: 18 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:01,181-Speed 10882.27 samples/sec Loss 6.3172 LearningRate 0.0297 Epoch: 18 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:02,075-Speed 11458.88 samples/sec Loss 6.3082 LearningRate 0.0297 Epoch: 18 Global Step: 103500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:03,039-Speed 10632.80 samples/sec Loss 6.4165 LearningRate 0.0297 Epoch: 18 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:03,983-Speed 10850.91 samples/sec Loss 6.3788 LearningRate 0.0297 Epoch: 18 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:04,884-Speed 11368.59 samples/sec Loss 6.3063 LearningRate 0.0297 Epoch: 18 Global Step: 103530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:05,760-Speed 11702.14 samples/sec Loss 6.4221 LearningRate 0.0297 Epoch: 18 Global Step: 103540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:06,642-Speed 11613.66 samples/sec Loss 6.3935 LearningRate 0.0297 Epoch: 18 Global Step: 103550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:07,552-Speed 11261.94 samples/sec Loss 6.2732 LearningRate 0.0297 Epoch: 18 Global Step: 103560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:08,437-Speed 11579.52 samples/sec Loss 6.3437 LearningRate 0.0297 Epoch: 18 Global Step: 103570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:09,320-Speed 11612.87 samples/sec Loss 6.1845 LearningRate 0.0297 Epoch: 18 Global Step: 103580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:10,220-Speed 11379.62 samples/sec Loss 6.3939 LearningRate 0.0297 Epoch: 18 Global Step: 103590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:11,135-Speed 11203.26 samples/sec Loss 6.3662 LearningRate 0.0296 Epoch: 18 Global Step: 103600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:12,099-Speed 10623.71 samples/sec Loss 6.1926 LearningRate 0.0296 Epoch: 18 Global Step: 103610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:13,065-Speed 10609.74 samples/sec Loss 6.4142 LearningRate 0.0296 Epoch: 18 Global Step: 103620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:14,032-Speed 10602.59 samples/sec Loss 6.2116 LearningRate 0.0296 Epoch: 18 Global Step: 103630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:14,909-Speed 11676.18 samples/sec Loss 6.3914 LearningRate 0.0296 Epoch: 18 Global Step: 103640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:15,827-Speed 11167.18 samples/sec Loss 6.3829 LearningRate 0.0296 Epoch: 18 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:16,779-Speed 10757.15 samples/sec Loss 6.5952 LearningRate 0.0296 Epoch: 18 Global Step: 103660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:17,666-Speed 11559.57 samples/sec Loss 6.4157 LearningRate 0.0296 Epoch: 18 Global Step: 103670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:18,594-Speed 11045.33 samples/sec Loss 6.2171 LearningRate 0.0296 Epoch: 18 Global Step: 103680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:19,549-Speed 10725.76 samples/sec Loss 6.4550 LearningRate 0.0296 Epoch: 18 Global Step: 103690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:20,495-Speed 10831.73 samples/sec Loss 6.3633 LearningRate 0.0296 Epoch: 18 Global Step: 103700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:21,394-Speed 11395.91 samples/sec Loss 6.3627 LearningRate 0.0296 Epoch: 18 Global Step: 103710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:22,281-Speed 11544.17 samples/sec Loss 6.3387 LearningRate 0.0296 Epoch: 18 Global Step: 103720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:23,236-Speed 10738.53 samples/sec Loss 6.3369 LearningRate 0.0296 Epoch: 18 Global Step: 103730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:24,195-Speed 10683.19 samples/sec Loss 6.3184 LearningRate 0.0296 Epoch: 18 Global Step: 103740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:25,106-Speed 11251.09 samples/sec Loss 6.5323 LearningRate 0.0296 Epoch: 18 Global Step: 103750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:26,011-Speed 11317.68 samples/sec Loss 6.3926 LearningRate 0.0296 Epoch: 18 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:26,919-Speed 11289.45 samples/sec Loss 6.3555 LearningRate 0.0296 Epoch: 18 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:27,846-Speed 11050.82 samples/sec Loss 6.4272 LearningRate 0.0296 Epoch: 18 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:28,740-Speed 11467.30 samples/sec Loss 6.3908 LearningRate 0.0296 Epoch: 18 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:29,671-Speed 11001.73 samples/sec Loss 6.4580 LearningRate 0.0296 Epoch: 18 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:30,593-Speed 11123.26 samples/sec Loss 6.2573 LearningRate 0.0295 Epoch: 18 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:31,583-Speed 10344.04 samples/sec Loss 6.4177 LearningRate 0.0295 Epoch: 18 Global Step: 103820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:32,511-Speed 11044.97 samples/sec Loss 6.3775 LearningRate 0.0295 Epoch: 18 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:33,418-Speed 11298.14 samples/sec Loss 6.4186 LearningRate 0.0295 Epoch: 18 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:34,307-Speed 11520.07 samples/sec Loss 6.3230 LearningRate 0.0295 Epoch: 18 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:35,177-Speed 11777.61 samples/sec Loss 6.4172 LearningRate 0.0295 Epoch: 18 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:36,090-Speed 11234.77 samples/sec Loss 6.2887 LearningRate 0.0295 Epoch: 18 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:37,012-Speed 11102.16 samples/sec Loss 6.3191 LearningRate 0.0295 Epoch: 18 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:37,914-Speed 11372.09 samples/sec Loss 6.4633 LearningRate 0.0295 Epoch: 18 Global Step: 103890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:38,807-Speed 11466.34 samples/sec Loss 6.5022 LearningRate 0.0295 Epoch: 18 Global Step: 103900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:39,722-Speed 11202.93 samples/sec Loss 6.3993 LearningRate 0.0295 Epoch: 18 Global Step: 103910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:40,616-Speed 11455.67 samples/sec Loss 6.3123 LearningRate 0.0295 Epoch: 18 Global Step: 103920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:41,512-Speed 11444.02 samples/sec Loss 6.4315 LearningRate 0.0295 Epoch: 18 Global Step: 103930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:42,432-Speed 11128.33 samples/sec Loss 6.2773 LearningRate 0.0295 Epoch: 18 Global Step: 103940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:43,360-Speed 11050.46 samples/sec Loss 6.3977 LearningRate 0.0295 Epoch: 18 Global Step: 103950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:55:44,271-Speed 11249.49 samples/sec Loss 6.4724 LearningRate 0.0295 Epoch: 18 Global Step: 103960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:45,160-Speed 11519.22 samples/sec Loss 6.3609 LearningRate 0.0295 Epoch: 18 Global Step: 103970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:46,090-Speed 11016.61 samples/sec Loss 6.4845 LearningRate 0.0295 Epoch: 18 Global Step: 103980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:46,983-Speed 11482.94 samples/sec Loss 6.4360 LearningRate 0.0295 Epoch: 18 Global Step: 103990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:55:47,899-Speed 11181.30 samples/sec Loss 6.3891 LearningRate 0.0295 Epoch: 18 Global Step: 104000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:56:09,180-[lfw][104000]XNorm: 10.889415 Training: 2022-04-11 02:56:09,181-[lfw][104000]Accuracy-Flip: 0.99567+-0.00343 Training: 2022-04-11 02:56:09,181-[lfw][104000]Accuracy-Highest: 0.99617 Training: 2022-04-11 02:56:33,827-[cfp_fp][104000]XNorm: 9.131989 Training: 2022-04-11 02:56:33,828-[cfp_fp][104000]Accuracy-Flip: 0.94757+-0.00988 Training: 2022-04-11 02:56:33,829-[cfp_fp][104000]Accuracy-Highest: 0.94757 Training: 2022-04-11 02:56:55,380-[agedb_30][104000]XNorm: 10.650479 Training: 2022-04-11 02:56:55,380-[agedb_30][104000]Accuracy-Flip: 0.96317+-0.00967 Training: 2022-04-11 02:56:55,381-[agedb_30][104000]Accuracy-Highest: 0.96483 Training: 2022-04-11 02:56:56,289-Speed 149.73 samples/sec Loss 6.2683 LearningRate 0.0295 Epoch: 18 Global Step: 104010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:56:57,177-Speed 11541.35 samples/sec Loss 6.2690 LearningRate 0.0294 Epoch: 18 Global Step: 104020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:56:58,077-Speed 11394.54 samples/sec Loss 6.1717 LearningRate 0.0294 Epoch: 18 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:56:59,004-Speed 11053.13 samples/sec Loss 6.3661 LearningRate 0.0294 Epoch: 18 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:56:59,897-Speed 11471.12 samples/sec Loss 6.4062 LearningRate 0.0294 Epoch: 18 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:00,836-Speed 10909.50 samples/sec Loss 6.4637 LearningRate 0.0294 Epoch: 18 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:01,772-Speed 10950.43 samples/sec Loss 6.3916 LearningRate 0.0294 Epoch: 18 Global Step: 104070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:02,688-Speed 11185.91 samples/sec Loss 6.3945 LearningRate 0.0294 Epoch: 18 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:03,598-Speed 11260.69 samples/sec Loss 6.3192 LearningRate 0.0294 Epoch: 18 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:04,548-Speed 10792.08 samples/sec Loss 6.3216 LearningRate 0.0294 Epoch: 18 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:05,530-Speed 10425.95 samples/sec Loss 6.3566 LearningRate 0.0294 Epoch: 18 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:06,441-Speed 11247.91 samples/sec Loss 6.4226 LearningRate 0.0294 Epoch: 18 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:07,403-Speed 10651.92 samples/sec Loss 6.3498 LearningRate 0.0294 Epoch: 18 Global Step: 104130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:08,294-Speed 11504.09 samples/sec Loss 6.3722 LearningRate 0.0294 Epoch: 18 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:09,193-Speed 11399.89 samples/sec Loss 6.4411 LearningRate 0.0294 Epoch: 18 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:10,078-Speed 11579.19 samples/sec Loss 6.3540 LearningRate 0.0294 Epoch: 18 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:10,979-Speed 11376.19 samples/sec Loss 6.2886 LearningRate 0.0294 Epoch: 18 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:11,871-Speed 11491.81 samples/sec Loss 6.4578 LearningRate 0.0294 Epoch: 18 Global Step: 104180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:12,755-Speed 11595.21 samples/sec Loss 6.3975 LearningRate 0.0294 Epoch: 18 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:13,643-Speed 11534.10 samples/sec Loss 6.4922 LearningRate 0.0294 Epoch: 18 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:14,527-Speed 11598.62 samples/sec Loss 6.3389 LearningRate 0.0294 Epoch: 18 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:15,473-Speed 10824.04 samples/sec Loss 6.3862 LearningRate 0.0294 Epoch: 18 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:16,394-Speed 11124.85 samples/sec Loss 6.4147 LearningRate 0.0293 Epoch: 18 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:17,302-Speed 11286.72 samples/sec Loss 6.4429 LearningRate 0.0293 Epoch: 18 Global Step: 104240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:18,194-Speed 11496.41 samples/sec Loss 6.4714 LearningRate 0.0293 Epoch: 18 Global Step: 104250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:19,112-Speed 11152.65 samples/sec Loss 6.3820 LearningRate 0.0293 Epoch: 18 Global Step: 104260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:20,023-Speed 11252.19 samples/sec Loss 6.4350 LearningRate 0.0293 Epoch: 18 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:20,910-Speed 11550.58 samples/sec Loss 6.4858 LearningRate 0.0293 Epoch: 18 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:21,799-Speed 11520.63 samples/sec Loss 6.5782 LearningRate 0.0293 Epoch: 18 Global Step: 104290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:22,680-Speed 11636.28 samples/sec Loss 6.4213 LearningRate 0.0293 Epoch: 18 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:23,620-Speed 10898.75 samples/sec Loss 6.5834 LearningRate 0.0293 Epoch: 18 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:24,525-Speed 11330.27 samples/sec Loss 6.3753 LearningRate 0.0293 Epoch: 18 Global Step: 104320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:25,464-Speed 10907.23 samples/sec Loss 6.3306 LearningRate 0.0293 Epoch: 18 Global Step: 104330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:26,358-Speed 11460.62 samples/sec Loss 6.3980 LearningRate 0.0293 Epoch: 18 Global Step: 104340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:27,257-Speed 11402.54 samples/sec Loss 6.4345 LearningRate 0.0293 Epoch: 18 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:28,167-Speed 11254.58 samples/sec Loss 6.3201 LearningRate 0.0293 Epoch: 18 Global Step: 104360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:29,078-Speed 11250.67 samples/sec Loss 6.3828 LearningRate 0.0293 Epoch: 18 Global Step: 104370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:30,007-Speed 11030.03 samples/sec Loss 6.4369 LearningRate 0.0293 Epoch: 18 Global Step: 104380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:30,898-Speed 11509.91 samples/sec Loss 6.3451 LearningRate 0.0293 Epoch: 18 Global Step: 104390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:31,819-Speed 11121.19 samples/sec Loss 6.3895 LearningRate 0.0293 Epoch: 18 Global Step: 104400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:32,753-Speed 10974.86 samples/sec Loss 6.3187 LearningRate 0.0293 Epoch: 18 Global Step: 104410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:33,649-Speed 11427.69 samples/sec Loss 6.4143 LearningRate 0.0293 Epoch: 18 Global Step: 104420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:34,533-Speed 11588.43 samples/sec Loss 6.4316 LearningRate 0.0293 Epoch: 18 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:35,405-Speed 11758.01 samples/sec Loss 6.4162 LearningRate 0.0292 Epoch: 18 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:36,322-Speed 11168.75 samples/sec Loss 6.4546 LearningRate 0.0292 Epoch: 18 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:37,251-Speed 11039.67 samples/sec Loss 6.4160 LearningRate 0.0292 Epoch: 18 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:38,158-Speed 11291.33 samples/sec Loss 6.4330 LearningRate 0.0292 Epoch: 18 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:39,050-Speed 11486.11 samples/sec Loss 6.4336 LearningRate 0.0292 Epoch: 18 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:39,959-Speed 11281.88 samples/sec Loss 6.5003 LearningRate 0.0292 Epoch: 18 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:40,845-Speed 11556.39 samples/sec Loss 6.3251 LearningRate 0.0292 Epoch: 18 Global Step: 104500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:57:41,740-Speed 11453.91 samples/sec Loss 6.5731 LearningRate 0.0292 Epoch: 18 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:42,625-Speed 11575.76 samples/sec Loss 6.4250 LearningRate 0.0292 Epoch: 18 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:43,546-Speed 11131.12 samples/sec Loss 6.4661 LearningRate 0.0292 Epoch: 18 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:44,474-Speed 11037.88 samples/sec Loss 6.5668 LearningRate 0.0292 Epoch: 18 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:45,369-Speed 11460.02 samples/sec Loss 6.5266 LearningRate 0.0292 Epoch: 18 Global Step: 104550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:46,267-Speed 11401.05 samples/sec Loss 6.4023 LearningRate 0.0292 Epoch: 18 Global Step: 104560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:47,159-Speed 11493.39 samples/sec Loss 6.5844 LearningRate 0.0292 Epoch: 18 Global Step: 104570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:48,128-Speed 10576.36 samples/sec Loss 6.4639 LearningRate 0.0292 Epoch: 18 Global Step: 104580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:49,035-Speed 11298.03 samples/sec Loss 6.2306 LearningRate 0.0292 Epoch: 18 Global Step: 104590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:49,957-Speed 11109.60 samples/sec Loss 6.2887 LearningRate 0.0292 Epoch: 18 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:50,902-Speed 10842.45 samples/sec Loss 6.5942 LearningRate 0.0292 Epoch: 18 Global Step: 104610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:51,851-Speed 10798.35 samples/sec Loss 6.5210 LearningRate 0.0292 Epoch: 18 Global Step: 104620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:52,779-Speed 11044.35 samples/sec Loss 6.4155 LearningRate 0.0292 Epoch: 18 Global Step: 104630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:53,742-Speed 10633.48 samples/sec Loss 6.5779 LearningRate 0.0292 Epoch: 18 Global Step: 104640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:54,655-Speed 11226.77 samples/sec Loss 6.4226 LearningRate 0.0291 Epoch: 18 Global Step: 104650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:55,597-Speed 10881.65 samples/sec Loss 6.4710 LearningRate 0.0291 Epoch: 18 Global Step: 104660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 02:57:56,540-Speed 10864.15 samples/sec Loss 6.4883 LearningRate 0.0291 Epoch: 18 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:57,432-Speed 11487.24 samples/sec Loss 6.4585 LearningRate 0.0291 Epoch: 18 Global Step: 104680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:58,314-Speed 11608.55 samples/sec Loss 6.2860 LearningRate 0.0291 Epoch: 18 Global Step: 104690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:57:59,231-Speed 11185.29 samples/sec Loss 6.4972 LearningRate 0.0291 Epoch: 18 Global Step: 104700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:00,162-Speed 11005.77 samples/sec Loss 6.5616 LearningRate 0.0291 Epoch: 18 Global Step: 104710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:01,094-Speed 10993.27 samples/sec Loss 6.4213 LearningRate 0.0291 Epoch: 18 Global Step: 104720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:02,005-Speed 11256.92 samples/sec Loss 6.5821 LearningRate 0.0291 Epoch: 18 Global Step: 104730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:02,904-Speed 11398.22 samples/sec Loss 6.3291 LearningRate 0.0291 Epoch: 18 Global Step: 104740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:03,824-Speed 11134.60 samples/sec Loss 6.5032 LearningRate 0.0291 Epoch: 18 Global Step: 104750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:04,735-Speed 11252.86 samples/sec Loss 6.4089 LearningRate 0.0291 Epoch: 18 Global Step: 104760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:05,615-Speed 11647.92 samples/sec Loss 6.5348 LearningRate 0.0291 Epoch: 18 Global Step: 104770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:06,580-Speed 10614.89 samples/sec Loss 6.4598 LearningRate 0.0291 Epoch: 18 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:07,491-Speed 11245.77 samples/sec Loss 6.3267 LearningRate 0.0291 Epoch: 18 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:08,411-Speed 11138.78 samples/sec Loss 6.4241 LearningRate 0.0291 Epoch: 18 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:09,305-Speed 11464.08 samples/sec Loss 6.4772 LearningRate 0.0291 Epoch: 18 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:10,181-Speed 11700.74 samples/sec Loss 6.4029 LearningRate 0.0291 Epoch: 18 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:11,070-Speed 11522.47 samples/sec Loss 6.4817 LearningRate 0.0291 Epoch: 18 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:12,045-Speed 10510.08 samples/sec Loss 6.5983 LearningRate 0.0291 Epoch: 18 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:12,994-Speed 10798.12 samples/sec Loss 6.6336 LearningRate 0.0291 Epoch: 18 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:13,900-Speed 11312.02 samples/sec Loss 6.5453 LearningRate 0.0290 Epoch: 18 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:14,814-Speed 11203.11 samples/sec Loss 6.4447 LearningRate 0.0290 Epoch: 18 Global Step: 104870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:15,745-Speed 11016.63 samples/sec Loss 6.5247 LearningRate 0.0290 Epoch: 18 Global Step: 104880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:16,687-Speed 10868.08 samples/sec Loss 6.4458 LearningRate 0.0290 Epoch: 18 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:17,582-Speed 11449.66 samples/sec Loss 6.4122 LearningRate 0.0290 Epoch: 18 Global Step: 104900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:18,484-Speed 11359.53 samples/sec Loss 6.3407 LearningRate 0.0290 Epoch: 18 Global Step: 104910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:19,427-Speed 10875.34 samples/sec Loss 6.3790 LearningRate 0.0290 Epoch: 18 Global Step: 104920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:20,347-Speed 11131.58 samples/sec Loss 6.3934 LearningRate 0.0290 Epoch: 18 Global Step: 104930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:21,246-Speed 11405.40 samples/sec Loss 6.4983 LearningRate 0.0290 Epoch: 18 Global Step: 104940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:22,142-Speed 11435.24 samples/sec Loss 6.5630 LearningRate 0.0290 Epoch: 18 Global Step: 104950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:23,033-Speed 11491.70 samples/sec Loss 6.3994 LearningRate 0.0290 Epoch: 18 Global Step: 104960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:23,936-Speed 11356.49 samples/sec Loss 6.4361 LearningRate 0.0290 Epoch: 18 Global Step: 104970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:24,848-Speed 11241.45 samples/sec Loss 6.4273 LearningRate 0.0290 Epoch: 18 Global Step: 104980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:25,733-Speed 11573.28 samples/sec Loss 6.4926 LearningRate 0.0290 Epoch: 18 Global Step: 104990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:26,618-Speed 11584.97 samples/sec Loss 6.3834 LearningRate 0.0290 Epoch: 18 Global Step: 105000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:27,516-Speed 11406.09 samples/sec Loss 6.5029 LearningRate 0.0290 Epoch: 18 Global Step: 105010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:28,409-Speed 11474.91 samples/sec Loss 6.5413 LearningRate 0.0290 Epoch: 18 Global Step: 105020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:29,301-Speed 11481.42 samples/sec Loss 6.6304 LearningRate 0.0290 Epoch: 18 Global Step: 105030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:30,187-Speed 11574.53 samples/sec Loss 6.5195 LearningRate 0.0290 Epoch: 18 Global Step: 105040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:31,097-Speed 11263.95 samples/sec Loss 6.3489 LearningRate 0.0290 Epoch: 18 Global Step: 105050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:32,029-Speed 10987.09 samples/sec Loss 6.3597 LearningRate 0.0290 Epoch: 18 Global Step: 105060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:32,929-Speed 11382.00 samples/sec Loss 6.4916 LearningRate 0.0289 Epoch: 18 Global Step: 105070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:33,841-Speed 11243.40 samples/sec Loss 6.5736 LearningRate 0.0289 Epoch: 18 Global Step: 105080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:34,720-Speed 11656.93 samples/sec Loss 6.4999 LearningRate 0.0289 Epoch: 18 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:35,675-Speed 10732.91 samples/sec Loss 6.4857 LearningRate 0.0289 Epoch: 18 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:36,615-Speed 10890.47 samples/sec Loss 6.4281 LearningRate 0.0289 Epoch: 18 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:37,579-Speed 10634.48 samples/sec Loss 6.4866 LearningRate 0.0289 Epoch: 18 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:38,467-Speed 11541.47 samples/sec Loss 6.4689 LearningRate 0.0289 Epoch: 18 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:39,419-Speed 10754.54 samples/sec Loss 6.4021 LearningRate 0.0289 Epoch: 18 Global Step: 105140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:40,315-Speed 11436.76 samples/sec Loss 6.6887 LearningRate 0.0289 Epoch: 18 Global Step: 105150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:41,255-Speed 10904.02 samples/sec Loss 6.3532 LearningRate 0.0289 Epoch: 18 Global Step: 105160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:42,165-Speed 11256.75 samples/sec Loss 6.3939 LearningRate 0.0289 Epoch: 18 Global Step: 105170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:43,118-Speed 10755.38 samples/sec Loss 6.4245 LearningRate 0.0289 Epoch: 18 Global Step: 105180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:44,060-Speed 10886.06 samples/sec Loss 6.4524 LearningRate 0.0289 Epoch: 18 Global Step: 105190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:44,933-Speed 11730.95 samples/sec Loss 6.5275 LearningRate 0.0289 Epoch: 18 Global Step: 105200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:45,826-Speed 11468.78 samples/sec Loss 6.5487 LearningRate 0.0289 Epoch: 18 Global Step: 105210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:46,710-Speed 11595.48 samples/sec Loss 6.5710 LearningRate 0.0289 Epoch: 18 Global Step: 105220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:47,607-Speed 11427.92 samples/sec Loss 6.4449 LearningRate 0.0289 Epoch: 18 Global Step: 105230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:48,525-Speed 11168.94 samples/sec Loss 6.4262 LearningRate 0.0289 Epoch: 18 Global Step: 105240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:49,430-Speed 11315.58 samples/sec Loss 6.4987 LearningRate 0.0289 Epoch: 18 Global Step: 105250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:50,310-Speed 11652.00 samples/sec Loss 6.4266 LearningRate 0.0289 Epoch: 18 Global Step: 105260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:51,189-Speed 11654.12 samples/sec Loss 6.5777 LearningRate 0.0289 Epoch: 18 Global Step: 105270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:52,114-Speed 11074.58 samples/sec Loss 6.5812 LearningRate 0.0288 Epoch: 18 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:53,070-Speed 10720.86 samples/sec Loss 6.5353 LearningRate 0.0288 Epoch: 18 Global Step: 105290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:53,980-Speed 11267.65 samples/sec Loss 6.4319 LearningRate 0.0288 Epoch: 18 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:54,904-Speed 11091.19 samples/sec Loss 6.2455 LearningRate 0.0288 Epoch: 18 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:55,795-Speed 11497.93 samples/sec Loss 6.5050 LearningRate 0.0288 Epoch: 18 Global Step: 105320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:56,679-Speed 11584.29 samples/sec Loss 6.5120 LearningRate 0.0288 Epoch: 18 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:57,559-Speed 11654.92 samples/sec Loss 6.3245 LearningRate 0.0288 Epoch: 18 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:58:58,462-Speed 11346.80 samples/sec Loss 6.4984 LearningRate 0.0288 Epoch: 18 Global Step: 105350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:58:59,353-Speed 11495.82 samples/sec Loss 6.5217 LearningRate 0.0288 Epoch: 18 Global Step: 105360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:00,280-Speed 11060.02 samples/sec Loss 6.5236 LearningRate 0.0288 Epoch: 18 Global Step: 105370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:01,243-Speed 10641.94 samples/sec Loss 6.4597 LearningRate 0.0288 Epoch: 18 Global Step: 105380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:02,184-Speed 10878.31 samples/sec Loss 6.2959 LearningRate 0.0288 Epoch: 18 Global Step: 105390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:03,167-Speed 10422.86 samples/sec Loss 6.5096 LearningRate 0.0288 Epoch: 18 Global Step: 105400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:04,088-Speed 11137.16 samples/sec Loss 6.3198 LearningRate 0.0288 Epoch: 18 Global Step: 105410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:05,022-Speed 10963.09 samples/sec Loss 6.3286 LearningRate 0.0288 Epoch: 18 Global Step: 105420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:05,964-Speed 10883.65 samples/sec Loss 6.4336 LearningRate 0.0288 Epoch: 18 Global Step: 105430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:06,916-Speed 10757.46 samples/sec Loss 6.4243 LearningRate 0.0288 Epoch: 18 Global Step: 105440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:07,814-Speed 11406.96 samples/sec Loss 6.4086 LearningRate 0.0288 Epoch: 18 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:08,707-Speed 11486.37 samples/sec Loss 6.5146 LearningRate 0.0288 Epoch: 18 Global Step: 105460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:09,610-Speed 11349.73 samples/sec Loss 6.5437 LearningRate 0.0288 Epoch: 18 Global Step: 105470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:10,495-Speed 11576.74 samples/sec Loss 6.5612 LearningRate 0.0288 Epoch: 18 Global Step: 105480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:11,373-Speed 11670.39 samples/sec Loss 6.5789 LearningRate 0.0287 Epoch: 18 Global Step: 105490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:12,258-Speed 11583.61 samples/sec Loss 6.3102 LearningRate 0.0287 Epoch: 18 Global Step: 105500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:13,134-Speed 11696.86 samples/sec Loss 6.3875 LearningRate 0.0287 Epoch: 18 Global Step: 105510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:14,036-Speed 11361.75 samples/sec Loss 6.3942 LearningRate 0.0287 Epoch: 18 Global Step: 105520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:14,946-Speed 11257.45 samples/sec Loss 6.6114 LearningRate 0.0287 Epoch: 18 Global Step: 105530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:15,880-Speed 10967.55 samples/sec Loss 6.5526 LearningRate 0.0287 Epoch: 18 Global Step: 105540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:16,794-Speed 11207.64 samples/sec Loss 6.5602 LearningRate 0.0287 Epoch: 18 Global Step: 105550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:17,702-Speed 11285.99 samples/sec Loss 6.4387 LearningRate 0.0287 Epoch: 18 Global Step: 105560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:18,608-Speed 11319.74 samples/sec Loss 6.3933 LearningRate 0.0287 Epoch: 18 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:19,503-Speed 11440.59 samples/sec Loss 6.3277 LearningRate 0.0287 Epoch: 18 Global Step: 105580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:20,384-Speed 11637.01 samples/sec Loss 6.5449 LearningRate 0.0287 Epoch: 18 Global Step: 105590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:21,267-Speed 11607.98 samples/sec Loss 6.5133 LearningRate 0.0287 Epoch: 18 Global Step: 105600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:22,162-Speed 11443.65 samples/sec Loss 6.4032 LearningRate 0.0287 Epoch: 18 Global Step: 105610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:23,060-Speed 11405.43 samples/sec Loss 6.4766 LearningRate 0.0287 Epoch: 18 Global Step: 105620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:23,964-Speed 11348.04 samples/sec Loss 6.5475 LearningRate 0.0287 Epoch: 18 Global Step: 105630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:24,899-Speed 10966.22 samples/sec Loss 6.4528 LearningRate 0.0287 Epoch: 18 Global Step: 105640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:25,834-Speed 10956.19 samples/sec Loss 6.4236 LearningRate 0.0287 Epoch: 18 Global Step: 105650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:26,769-Speed 10954.74 samples/sec Loss 6.3709 LearningRate 0.0287 Epoch: 18 Global Step: 105660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:27,698-Speed 11034.15 samples/sec Loss 6.5692 LearningRate 0.0287 Epoch: 18 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:28,634-Speed 10948.40 samples/sec Loss 6.4984 LearningRate 0.0287 Epoch: 18 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:29,560-Speed 11069.39 samples/sec Loss 6.5544 LearningRate 0.0287 Epoch: 18 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:30,491-Speed 10998.95 samples/sec Loss 6.4106 LearningRate 0.0287 Epoch: 18 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:31,417-Speed 11075.48 samples/sec Loss 6.4509 LearningRate 0.0286 Epoch: 18 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:32,311-Speed 11463.33 samples/sec Loss 6.3210 LearningRate 0.0286 Epoch: 18 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:33,186-Speed 11716.83 samples/sec Loss 6.4629 LearningRate 0.0286 Epoch: 18 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:34,074-Speed 11529.45 samples/sec Loss 6.3912 LearningRate 0.0286 Epoch: 18 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:34,946-Speed 11750.88 samples/sec Loss 6.4880 LearningRate 0.0286 Epoch: 18 Global Step: 105750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:35,874-Speed 11042.02 samples/sec Loss 6.5224 LearningRate 0.0286 Epoch: 18 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:36,811-Speed 10944.37 samples/sec Loss 6.4982 LearningRate 0.0286 Epoch: 18 Global Step: 105770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:37,718-Speed 11289.10 samples/sec Loss 6.4405 LearningRate 0.0286 Epoch: 18 Global Step: 105780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:38,627-Speed 11280.33 samples/sec Loss 6.4865 LearningRate 0.0286 Epoch: 18 Global Step: 105790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:39,539-Speed 11239.26 samples/sec Loss 6.6115 LearningRate 0.0286 Epoch: 18 Global Step: 105800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:40,439-Speed 11383.26 samples/sec Loss 6.4332 LearningRate 0.0286 Epoch: 18 Global Step: 105810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:41,352-Speed 11217.10 samples/sec Loss 6.5773 LearningRate 0.0286 Epoch: 18 Global Step: 105820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:42,277-Speed 11081.60 samples/sec Loss 6.5264 LearningRate 0.0286 Epoch: 18 Global Step: 105830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:43,148-Speed 11758.38 samples/sec Loss 6.6235 LearningRate 0.0286 Epoch: 18 Global Step: 105840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:44,072-Speed 11089.84 samples/sec Loss 6.4744 LearningRate 0.0286 Epoch: 18 Global Step: 105850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:44,984-Speed 11240.37 samples/sec Loss 6.5619 LearningRate 0.0286 Epoch: 18 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 02:59:45,906-Speed 11111.59 samples/sec Loss 6.4177 LearningRate 0.0286 Epoch: 18 Global Step: 105870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:46,797-Speed 11507.93 samples/sec Loss 6.4249 LearningRate 0.0286 Epoch: 18 Global Step: 105880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:47,686-Speed 11527.41 samples/sec Loss 6.3228 LearningRate 0.0286 Epoch: 18 Global Step: 105890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:48,617-Speed 10995.89 samples/sec Loss 6.5476 LearningRate 0.0286 Epoch: 18 Global Step: 105900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:49,520-Speed 11350.39 samples/sec Loss 6.4131 LearningRate 0.0286 Epoch: 18 Global Step: 105910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:50,448-Speed 11047.56 samples/sec Loss 6.4051 LearningRate 0.0285 Epoch: 18 Global Step: 105920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:51,344-Speed 11425.84 samples/sec Loss 6.4440 LearningRate 0.0285 Epoch: 18 Global Step: 105930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:52,301-Speed 10712.56 samples/sec Loss 6.5048 LearningRate 0.0285 Epoch: 18 Global Step: 105940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:53,251-Speed 10792.00 samples/sec Loss 6.6212 LearningRate 0.0285 Epoch: 18 Global Step: 105950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:54,186-Speed 10957.62 samples/sec Loss 6.4981 LearningRate 0.0285 Epoch: 18 Global Step: 105960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:55,103-Speed 11176.69 samples/sec Loss 6.5796 LearningRate 0.0285 Epoch: 18 Global Step: 105970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:56,005-Speed 11355.13 samples/sec Loss 6.5497 LearningRate 0.0285 Epoch: 18 Global Step: 105980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:56,884-Speed 11659.19 samples/sec Loss 6.5818 LearningRate 0.0285 Epoch: 18 Global Step: 105990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 02:59:57,763-Speed 11664.20 samples/sec Loss 6.6347 LearningRate 0.0285 Epoch: 18 Global Step: 106000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:00:19,013-[lfw][106000]XNorm: 10.735831 Training: 2022-04-11 03:00:19,014-[lfw][106000]Accuracy-Flip: 0.99533+-0.00296 Training: 2022-04-11 03:00:19,014-[lfw][106000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:00:43,568-[cfp_fp][106000]XNorm: 9.033715 Training: 2022-04-11 03:00:43,569-[cfp_fp][106000]Accuracy-Flip: 0.94514+-0.00737 Training: 2022-04-11 03:00:43,569-[cfp_fp][106000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:01:04,675-[agedb_30][106000]XNorm: 10.527813 Training: 2022-04-11 03:01:04,676-[agedb_30][106000]Accuracy-Flip: 0.96367+-0.00846 Training: 2022-04-11 03:01:04,676-[agedb_30][106000]Accuracy-Highest: 0.96483 Training: 2022-04-11 03:01:05,631-Speed 150.88 samples/sec Loss 6.4770 LearningRate 0.0285 Epoch: 18 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:06,594-Speed 10635.83 samples/sec Loss 6.4521 LearningRate 0.0285 Epoch: 18 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:07,530-Speed 10947.31 samples/sec Loss 6.5464 LearningRate 0.0285 Epoch: 18 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:08,478-Speed 10812.83 samples/sec Loss 6.4653 LearningRate 0.0285 Epoch: 18 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:09,386-Speed 11288.38 samples/sec Loss 6.3852 LearningRate 0.0285 Epoch: 18 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:10,265-Speed 11650.29 samples/sec Loss 6.5655 LearningRate 0.0285 Epoch: 18 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:11,158-Speed 11480.29 samples/sec Loss 6.5522 LearningRate 0.0285 Epoch: 18 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:12,094-Speed 10941.66 samples/sec Loss 6.3464 LearningRate 0.0285 Epoch: 18 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:13,016-Speed 11112.71 samples/sec Loss 6.5422 LearningRate 0.0285 Epoch: 18 Global Step: 106090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:13,927-Speed 11256.74 samples/sec Loss 6.5499 LearningRate 0.0285 Epoch: 18 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:14,834-Speed 11290.29 samples/sec Loss 6.4855 LearningRate 0.0285 Epoch: 18 Global Step: 106110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:15,743-Speed 11273.47 samples/sec Loss 6.3524 LearningRate 0.0285 Epoch: 18 Global Step: 106120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:16,678-Speed 10967.58 samples/sec Loss 6.5136 LearningRate 0.0284 Epoch: 18 Global Step: 106130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:17,604-Speed 11057.90 samples/sec Loss 6.5548 LearningRate 0.0284 Epoch: 18 Global Step: 106140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:18,484-Speed 11651.86 samples/sec Loss 6.4653 LearningRate 0.0284 Epoch: 18 Global Step: 106150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:19,448-Speed 10621.82 samples/sec Loss 6.4260 LearningRate 0.0284 Epoch: 18 Global Step: 106160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:20,355-Speed 11298.43 samples/sec Loss 6.3929 LearningRate 0.0284 Epoch: 18 Global Step: 106170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:21,287-Speed 10992.99 samples/sec Loss 6.3741 LearningRate 0.0284 Epoch: 18 Global Step: 106180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:22,207-Speed 11143.67 samples/sec Loss 6.5701 LearningRate 0.0284 Epoch: 18 Global Step: 106190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:23,085-Speed 11667.51 samples/sec Loss 6.4950 LearningRate 0.0284 Epoch: 18 Global Step: 106200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:23,993-Speed 11289.80 samples/sec Loss 6.3917 LearningRate 0.0284 Epoch: 18 Global Step: 106210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:24,889-Speed 11432.08 samples/sec Loss 6.4769 LearningRate 0.0284 Epoch: 18 Global Step: 106220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:25,812-Speed 11108.99 samples/sec Loss 6.6226 LearningRate 0.0284 Epoch: 18 Global Step: 106230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:26,733-Speed 11121.76 samples/sec Loss 6.6322 LearningRate 0.0284 Epoch: 18 Global Step: 106240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:27,621-Speed 11536.03 samples/sec Loss 6.3599 LearningRate 0.0284 Epoch: 18 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:28,591-Speed 10576.75 samples/sec Loss 6.6183 LearningRate 0.0284 Epoch: 18 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:29,514-Speed 11096.17 samples/sec Loss 6.6697 LearningRate 0.0284 Epoch: 18 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:30,449-Speed 10962.44 samples/sec Loss 6.5590 LearningRate 0.0284 Epoch: 18 Global Step: 106280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:31,383-Speed 10973.50 samples/sec Loss 6.5226 LearningRate 0.0284 Epoch: 18 Global Step: 106290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:32,285-Speed 11355.50 samples/sec Loss 6.4702 LearningRate 0.0284 Epoch: 18 Global Step: 106300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:33,174-Speed 11532.98 samples/sec Loss 6.3779 LearningRate 0.0284 Epoch: 18 Global Step: 106310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:34,078-Speed 11330.63 samples/sec Loss 6.4893 LearningRate 0.0284 Epoch: 18 Global Step: 106320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:34,982-Speed 11339.78 samples/sec Loss 6.3995 LearningRate 0.0284 Epoch: 18 Global Step: 106330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:35,874-Speed 11483.35 samples/sec Loss 6.5205 LearningRate 0.0284 Epoch: 18 Global Step: 106340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:36,765-Speed 11513.20 samples/sec Loss 6.4262 LearningRate 0.0283 Epoch: 18 Global Step: 106350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:37,655-Speed 11503.62 samples/sec Loss 6.5657 LearningRate 0.0283 Epoch: 18 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:38,560-Speed 11320.50 samples/sec Loss 6.3589 LearningRate 0.0283 Epoch: 18 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:39,493-Speed 10988.33 samples/sec Loss 6.5315 LearningRate 0.0283 Epoch: 18 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:40,397-Speed 11337.35 samples/sec Loss 6.4685 LearningRate 0.0283 Epoch: 18 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:41,314-Speed 11166.67 samples/sec Loss 6.4146 LearningRate 0.0283 Epoch: 18 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:42,245-Speed 11008.56 samples/sec Loss 6.5248 LearningRate 0.0283 Epoch: 18 Global Step: 106410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:43,149-Speed 11340.64 samples/sec Loss 6.5537 LearningRate 0.0283 Epoch: 18 Global Step: 106420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:44,077-Speed 11037.51 samples/sec Loss 6.5602 LearningRate 0.0283 Epoch: 18 Global Step: 106430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:44,954-Speed 11686.70 samples/sec Loss 6.4199 LearningRate 0.0283 Epoch: 18 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:45,843-Speed 11548.47 samples/sec Loss 6.5899 LearningRate 0.0283 Epoch: 18 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:46,742-Speed 11387.47 samples/sec Loss 6.4934 LearningRate 0.0283 Epoch: 18 Global Step: 106460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:47,647-Speed 11323.60 samples/sec Loss 6.4966 LearningRate 0.0283 Epoch: 18 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:48,552-Speed 11324.69 samples/sec Loss 6.4876 LearningRate 0.0283 Epoch: 18 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:49,451-Speed 11405.26 samples/sec Loss 6.5105 LearningRate 0.0283 Epoch: 18 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:50,369-Speed 11151.92 samples/sec Loss 6.4237 LearningRate 0.0283 Epoch: 18 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:51,302-Speed 10980.81 samples/sec Loss 6.5460 LearningRate 0.0283 Epoch: 18 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:52,240-Speed 10927.13 samples/sec Loss 6.6066 LearningRate 0.0283 Epoch: 18 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:53,142-Speed 11359.25 samples/sec Loss 6.5201 LearningRate 0.0283 Epoch: 18 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:54,097-Speed 10735.12 samples/sec Loss 6.5211 LearningRate 0.0283 Epoch: 18 Global Step: 106540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:55,016-Speed 11152.68 samples/sec Loss 6.3883 LearningRate 0.0283 Epoch: 18 Global Step: 106550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:55,910-Speed 11461.87 samples/sec Loss 6.4044 LearningRate 0.0282 Epoch: 18 Global Step: 106560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:01:56,809-Speed 11406.00 samples/sec Loss 6.4532 LearningRate 0.0282 Epoch: 18 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:57,690-Speed 11622.28 samples/sec Loss 6.3325 LearningRate 0.0282 Epoch: 18 Global Step: 106580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:58,598-Speed 11283.06 samples/sec Loss 6.4568 LearningRate 0.0282 Epoch: 18 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:01:59,507-Speed 11281.80 samples/sec Loss 6.5043 LearningRate 0.0282 Epoch: 18 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:00,395-Speed 11534.53 samples/sec Loss 6.4606 LearningRate 0.0282 Epoch: 18 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:01,308-Speed 11221.34 samples/sec Loss 6.5327 LearningRate 0.0282 Epoch: 18 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:02,198-Speed 11523.58 samples/sec Loss 6.6047 LearningRate 0.0282 Epoch: 18 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:03,101-Speed 11340.75 samples/sec Loss 6.5619 LearningRate 0.0282 Epoch: 18 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:03,991-Speed 11513.52 samples/sec Loss 6.4481 LearningRate 0.0282 Epoch: 18 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:04,901-Speed 11267.15 samples/sec Loss 6.4688 LearningRate 0.0282 Epoch: 18 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:05,832-Speed 11003.06 samples/sec Loss 6.5759 LearningRate 0.0282 Epoch: 18 Global Step: 106670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:06,731-Speed 11398.09 samples/sec Loss 6.4824 LearningRate 0.0282 Epoch: 18 Global Step: 106680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:07,653-Speed 11107.41 samples/sec Loss 6.4665 LearningRate 0.0282 Epoch: 18 Global Step: 106690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:08,577-Speed 11090.46 samples/sec Loss 6.4455 LearningRate 0.0282 Epoch: 18 Global Step: 106700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:09,455-Speed 11678.24 samples/sec Loss 6.5411 LearningRate 0.0282 Epoch: 18 Global Step: 106710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:10,342-Speed 11548.43 samples/sec Loss 6.3769 LearningRate 0.0282 Epoch: 18 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:11,271-Speed 11029.71 samples/sec Loss 6.5320 LearningRate 0.0282 Epoch: 18 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:12,168-Speed 11422.44 samples/sec Loss 6.4333 LearningRate 0.0282 Epoch: 18 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:13,101-Speed 10992.61 samples/sec Loss 6.4651 LearningRate 0.0282 Epoch: 18 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:14,039-Speed 10916.48 samples/sec Loss 6.5691 LearningRate 0.0282 Epoch: 18 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:14,950-Speed 11247.16 samples/sec Loss 6.5278 LearningRate 0.0281 Epoch: 18 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:15,865-Speed 11203.70 samples/sec Loss 6.4680 LearningRate 0.0281 Epoch: 18 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:16,778-Speed 11222.27 samples/sec Loss 6.4464 LearningRate 0.0281 Epoch: 18 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:17,714-Speed 10945.96 samples/sec Loss 6.4105 LearningRate 0.0281 Epoch: 18 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:18,633-Speed 11154.32 samples/sec Loss 6.4450 LearningRate 0.0281 Epoch: 18 Global Step: 106810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:19,538-Speed 11325.19 samples/sec Loss 6.4533 LearningRate 0.0281 Epoch: 18 Global Step: 106820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:20,463-Speed 11076.39 samples/sec Loss 6.5704 LearningRate 0.0281 Epoch: 18 Global Step: 106830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:21,346-Speed 11607.30 samples/sec Loss 6.5476 LearningRate 0.0281 Epoch: 18 Global Step: 106840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:22,241-Speed 11443.72 samples/sec Loss 6.4655 LearningRate 0.0281 Epoch: 18 Global Step: 106850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:23,154-Speed 11230.41 samples/sec Loss 6.3981 LearningRate 0.0281 Epoch: 18 Global Step: 106860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:24,035-Speed 11627.46 samples/sec Loss 6.4705 LearningRate 0.0281 Epoch: 18 Global Step: 106870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:24,911-Speed 11694.56 samples/sec Loss 6.4912 LearningRate 0.0281 Epoch: 18 Global Step: 106880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:25,811-Speed 11388.40 samples/sec Loss 6.4229 LearningRate 0.0281 Epoch: 18 Global Step: 106890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:26,712-Speed 11366.73 samples/sec Loss 6.3991 LearningRate 0.0281 Epoch: 18 Global Step: 106900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:27,621-Speed 11272.29 samples/sec Loss 6.3874 LearningRate 0.0281 Epoch: 18 Global Step: 106910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:28,501-Speed 11653.71 samples/sec Loss 6.5689 LearningRate 0.0281 Epoch: 18 Global Step: 106920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:29,392-Speed 11496.71 samples/sec Loss 6.5450 LearningRate 0.0281 Epoch: 18 Global Step: 106930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:30,296-Speed 11338.14 samples/sec Loss 6.5478 LearningRate 0.0281 Epoch: 18 Global Step: 106940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:31,213-Speed 11174.15 samples/sec Loss 6.6457 LearningRate 0.0281 Epoch: 18 Global Step: 106950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:32,128-Speed 11200.35 samples/sec Loss 6.4277 LearningRate 0.0281 Epoch: 18 Global Step: 106960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:33,023-Speed 11441.18 samples/sec Loss 6.5441 LearningRate 0.0281 Epoch: 18 Global Step: 106970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:33,901-Speed 11668.86 samples/sec Loss 6.3183 LearningRate 0.0281 Epoch: 18 Global Step: 106980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:34,809-Speed 11293.97 samples/sec Loss 6.6611 LearningRate 0.0280 Epoch: 18 Global Step: 106990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:35,698-Speed 11516.94 samples/sec Loss 6.4419 LearningRate 0.0280 Epoch: 18 Global Step: 107000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:36,645-Speed 10827.04 samples/sec Loss 6.5632 LearningRate 0.0280 Epoch: 18 Global Step: 107010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:37,559-Speed 11210.77 samples/sec Loss 6.6491 LearningRate 0.0280 Epoch: 18 Global Step: 107020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:38,452-Speed 11470.35 samples/sec Loss 6.5470 LearningRate 0.0280 Epoch: 18 Global Step: 107030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:39,369-Speed 11175.92 samples/sec Loss 6.5703 LearningRate 0.0280 Epoch: 18 Global Step: 107040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:40,299-Speed 11026.82 samples/sec Loss 6.4576 LearningRate 0.0280 Epoch: 18 Global Step: 107050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:41,230-Speed 11005.82 samples/sec Loss 6.6036 LearningRate 0.0280 Epoch: 18 Global Step: 107060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:42,129-Speed 11389.07 samples/sec Loss 6.4621 LearningRate 0.0280 Epoch: 18 Global Step: 107070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:43,076-Speed 10829.19 samples/sec Loss 6.5934 LearningRate 0.0280 Epoch: 18 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:44,010-Speed 10971.91 samples/sec Loss 6.5946 LearningRate 0.0280 Epoch: 18 Global Step: 107090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:44,925-Speed 11201.83 samples/sec Loss 6.5403 LearningRate 0.0280 Epoch: 18 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:45,841-Speed 11175.11 samples/sec Loss 6.4492 LearningRate 0.0280 Epoch: 18 Global Step: 107110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:46,743-Speed 11368.24 samples/sec Loss 6.5603 LearningRate 0.0280 Epoch: 18 Global Step: 107120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:47,665-Speed 11107.55 samples/sec Loss 6.6056 LearningRate 0.0280 Epoch: 18 Global Step: 107130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:48,609-Speed 10853.12 samples/sec Loss 6.4973 LearningRate 0.0280 Epoch: 18 Global Step: 107140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:49,513-Speed 11339.30 samples/sec Loss 6.5814 LearningRate 0.0280 Epoch: 18 Global Step: 107150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:50,411-Speed 11417.84 samples/sec Loss 6.5779 LearningRate 0.0280 Epoch: 18 Global Step: 107160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:02:51,299-Speed 11532.24 samples/sec Loss 6.5310 LearningRate 0.0280 Epoch: 18 Global Step: 107170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:52,210-Speed 11250.44 samples/sec Loss 6.5087 LearningRate 0.0280 Epoch: 18 Global Step: 107180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:53,121-Speed 11247.65 samples/sec Loss 6.4137 LearningRate 0.0280 Epoch: 18 Global Step: 107190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:54,020-Speed 11401.13 samples/sec Loss 6.4860 LearningRate 0.0279 Epoch: 18 Global Step: 107200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:54,922-Speed 11361.94 samples/sec Loss 6.5965 LearningRate 0.0279 Epoch: 18 Global Step: 107210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:55,834-Speed 11235.22 samples/sec Loss 6.3838 LearningRate 0.0279 Epoch: 18 Global Step: 107220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:56,756-Speed 11117.29 samples/sec Loss 6.4699 LearningRate 0.0279 Epoch: 18 Global Step: 107230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:57,652-Speed 11433.28 samples/sec Loss 6.4160 LearningRate 0.0279 Epoch: 18 Global Step: 107240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:58,551-Speed 11397.71 samples/sec Loss 6.5670 LearningRate 0.0279 Epoch: 18 Global Step: 107250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:02:59,467-Speed 11183.52 samples/sec Loss 6.3379 LearningRate 0.0279 Epoch: 18 Global Step: 107260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:00,375-Speed 11280.72 samples/sec Loss 6.4110 LearningRate 0.0279 Epoch: 18 Global Step: 107270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:01,307-Speed 11003.06 samples/sec Loss 6.4171 LearningRate 0.0279 Epoch: 18 Global Step: 107280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:02,267-Speed 10674.58 samples/sec Loss 6.4984 LearningRate 0.0279 Epoch: 18 Global Step: 107290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:03,238-Speed 10558.09 samples/sec Loss 6.4838 LearningRate 0.0279 Epoch: 18 Global Step: 107300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:04,200-Speed 10649.81 samples/sec Loss 6.4414 LearningRate 0.0279 Epoch: 18 Global Step: 107310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:05,145-Speed 10845.33 samples/sec Loss 6.4548 LearningRate 0.0279 Epoch: 18 Global Step: 107320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:06,065-Speed 11131.68 samples/sec Loss 6.4635 LearningRate 0.0279 Epoch: 18 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:06,969-Speed 11342.47 samples/sec Loss 6.5492 LearningRate 0.0279 Epoch: 18 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:07,936-Speed 10596.67 samples/sec Loss 6.5332 LearningRate 0.0279 Epoch: 18 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:08,837-Speed 11373.06 samples/sec Loss 6.4599 LearningRate 0.0279 Epoch: 18 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:09,721-Speed 11591.44 samples/sec Loss 6.3463 LearningRate 0.0279 Epoch: 18 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:10,594-Speed 11738.29 samples/sec Loss 6.5729 LearningRate 0.0279 Epoch: 18 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:11,489-Speed 11443.23 samples/sec Loss 6.3808 LearningRate 0.0279 Epoch: 18 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:12,370-Speed 11636.54 samples/sec Loss 6.6460 LearningRate 0.0279 Epoch: 18 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:13,286-Speed 11182.68 samples/sec Loss 6.4960 LearningRate 0.0279 Epoch: 18 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:14,259-Speed 10527.85 samples/sec Loss 6.6182 LearningRate 0.0278 Epoch: 18 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:15,193-Speed 10976.03 samples/sec Loss 6.5226 LearningRate 0.0278 Epoch: 18 Global Step: 107430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:16,090-Speed 11415.26 samples/sec Loss 6.4265 LearningRate 0.0278 Epoch: 18 Global Step: 107440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:16,984-Speed 11469.20 samples/sec Loss 6.5253 LearningRate 0.0278 Epoch: 18 Global Step: 107450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:17,933-Speed 10791.22 samples/sec Loss 6.5617 LearningRate 0.0278 Epoch: 18 Global Step: 107460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:18,874-Speed 10889.73 samples/sec Loss 6.5813 LearningRate 0.0278 Epoch: 18 Global Step: 107470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:19,778-Speed 11340.44 samples/sec Loss 6.4353 LearningRate 0.0278 Epoch: 18 Global Step: 107480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:20,683-Speed 11321.18 samples/sec Loss 6.5505 LearningRate 0.0278 Epoch: 18 Global Step: 107490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:21,565-Speed 11622.27 samples/sec Loss 6.5257 LearningRate 0.0278 Epoch: 18 Global Step: 107500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:22,444-Speed 11654.42 samples/sec Loss 6.5870 LearningRate 0.0278 Epoch: 18 Global Step: 107510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:23,337-Speed 11484.78 samples/sec Loss 6.6014 LearningRate 0.0278 Epoch: 18 Global Step: 107520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:24,223-Speed 11564.07 samples/sec Loss 6.4451 LearningRate 0.0278 Epoch: 18 Global Step: 107530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:25,108-Speed 11581.92 samples/sec Loss 6.5014 LearningRate 0.0278 Epoch: 18 Global Step: 107540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:26,075-Speed 10600.12 samples/sec Loss 6.4939 LearningRate 0.0278 Epoch: 18 Global Step: 107550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:26,974-Speed 11400.44 samples/sec Loss 6.6268 LearningRate 0.0278 Epoch: 18 Global Step: 107560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:27,903-Speed 11028.32 samples/sec Loss 6.5125 LearningRate 0.0278 Epoch: 18 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:28,856-Speed 10754.05 samples/sec Loss 6.5341 LearningRate 0.0278 Epoch: 18 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:29,789-Speed 10985.34 samples/sec Loss 6.4190 LearningRate 0.0278 Epoch: 18 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:30,740-Speed 10780.10 samples/sec Loss 6.5087 LearningRate 0.0278 Epoch: 18 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:31,685-Speed 10840.95 samples/sec Loss 6.3990 LearningRate 0.0278 Epoch: 18 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:32,587-Speed 11366.27 samples/sec Loss 6.5886 LearningRate 0.0278 Epoch: 18 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:33,479-Speed 11488.03 samples/sec Loss 6.4422 LearningRate 0.0277 Epoch: 18 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:34,357-Speed 11670.00 samples/sec Loss 6.5537 LearningRate 0.0277 Epoch: 18 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:35,245-Speed 11538.47 samples/sec Loss 6.5710 LearningRate 0.0277 Epoch: 18 Global Step: 107650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:36,150-Speed 11318.39 samples/sec Loss 6.5860 LearningRate 0.0277 Epoch: 18 Global Step: 107660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:37,049-Speed 11394.53 samples/sec Loss 6.4802 LearningRate 0.0277 Epoch: 18 Global Step: 107670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:37,969-Speed 11143.79 samples/sec Loss 6.5523 LearningRate 0.0277 Epoch: 18 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:38,887-Speed 11161.62 samples/sec Loss 6.5885 LearningRate 0.0277 Epoch: 18 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:39,770-Speed 11603.30 samples/sec Loss 6.3743 LearningRate 0.0277 Epoch: 18 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:40,671-Speed 11377.57 samples/sec Loss 6.5010 LearningRate 0.0277 Epoch: 18 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:41,551-Speed 11644.09 samples/sec Loss 6.5439 LearningRate 0.0277 Epoch: 18 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:42,457-Speed 11301.23 samples/sec Loss 6.4179 LearningRate 0.0277 Epoch: 18 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:43,393-Speed 10948.56 samples/sec Loss 6.4765 LearningRate 0.0277 Epoch: 18 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:44,341-Speed 10806.52 samples/sec Loss 6.4313 LearningRate 0.0277 Epoch: 18 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:45,243-Speed 11365.42 samples/sec Loss 6.4254 LearningRate 0.0277 Epoch: 18 Global Step: 107760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:46,131-Speed 11537.94 samples/sec Loss 6.4579 LearningRate 0.0277 Epoch: 18 Global Step: 107770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:47,046-Speed 11205.73 samples/sec Loss 6.5917 LearningRate 0.0277 Epoch: 18 Global Step: 107780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:47,987-Speed 10888.44 samples/sec Loss 6.6061 LearningRate 0.0277 Epoch: 18 Global Step: 107790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:48,888-Speed 11370.23 samples/sec Loss 6.4369 LearningRate 0.0277 Epoch: 18 Global Step: 107800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:49,811-Speed 11109.49 samples/sec Loss 6.4573 LearningRate 0.0277 Epoch: 18 Global Step: 107810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:50,713-Speed 11357.73 samples/sec Loss 6.5651 LearningRate 0.0277 Epoch: 18 Global Step: 107820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:03:51,645-Speed 10995.59 samples/sec Loss 6.5834 LearningRate 0.0277 Epoch: 18 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:52,597-Speed 10758.21 samples/sec Loss 6.4035 LearningRate 0.0277 Epoch: 18 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:53,498-Speed 11375.45 samples/sec Loss 6.5589 LearningRate 0.0276 Epoch: 18 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:54,402-Speed 11342.62 samples/sec Loss 6.4769 LearningRate 0.0276 Epoch: 18 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:55,341-Speed 10906.80 samples/sec Loss 6.5776 LearningRate 0.0276 Epoch: 18 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:56,244-Speed 11346.83 samples/sec Loss 6.5363 LearningRate 0.0276 Epoch: 18 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:57,142-Speed 11410.40 samples/sec Loss 6.4265 LearningRate 0.0276 Epoch: 18 Global Step: 107890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:58,041-Speed 11396.42 samples/sec Loss 6.4635 LearningRate 0.0276 Epoch: 18 Global Step: 107900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:58,943-Speed 11365.26 samples/sec Loss 6.4562 LearningRate 0.0276 Epoch: 18 Global Step: 107910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:03:59,820-Speed 11684.78 samples/sec Loss 6.6306 LearningRate 0.0276 Epoch: 18 Global Step: 107920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:04:00,723-Speed 11354.01 samples/sec Loss 6.5047 LearningRate 0.0276 Epoch: 18 Global Step: 107930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:01,608-Speed 11579.39 samples/sec Loss 6.4129 LearningRate 0.0276 Epoch: 18 Global Step: 107940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:02,540-Speed 10990.97 samples/sec Loss 6.4706 LearningRate 0.0276 Epoch: 18 Global Step: 107950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:03,467-Speed 11045.42 samples/sec Loss 6.6308 LearningRate 0.0276 Epoch: 18 Global Step: 107960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:04,356-Speed 11525.74 samples/sec Loss 6.5429 LearningRate 0.0276 Epoch: 18 Global Step: 107970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:05,284-Speed 11048.72 samples/sec Loss 6.7508 LearningRate 0.0276 Epoch: 18 Global Step: 107980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:06,169-Speed 11587.29 samples/sec Loss 6.5172 LearningRate 0.0276 Epoch: 18 Global Step: 107990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:07,057-Speed 11542.01 samples/sec Loss 6.4974 LearningRate 0.0276 Epoch: 18 Global Step: 108000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:04:28,402-[lfw][108000]XNorm: 10.729645 Training: 2022-04-11 03:04:28,402-[lfw][108000]Accuracy-Flip: 0.99533+-0.00356 Training: 2022-04-11 03:04:28,402-[lfw][108000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:04:53,093-[cfp_fp][108000]XNorm: 8.978059 Training: 2022-04-11 03:04:53,093-[cfp_fp][108000]Accuracy-Flip: 0.94471+-0.01275 Training: 2022-04-11 03:04:53,094-[cfp_fp][108000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:05:14,302-[agedb_30][108000]XNorm: 10.432188 Training: 2022-04-11 03:05:14,303-[agedb_30][108000]Accuracy-Flip: 0.96350+-0.00998 Training: 2022-04-11 03:05:14,303-[agedb_30][108000]Accuracy-Highest: 0.96483 Training: 2022-04-11 03:05:15,233-Speed 150.20 samples/sec Loss 6.4336 LearningRate 0.0276 Epoch: 18 Global Step: 108010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:16,116-Speed 11598.13 samples/sec Loss 6.3696 LearningRate 0.0276 Epoch: 18 Global Step: 108020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:17,059-Speed 10868.64 samples/sec Loss 6.4951 LearningRate 0.0276 Epoch: 18 Global Step: 108030 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 03:05:28,274-Speed 913.17 samples/sec Loss 6.0685 LearningRate 0.0276 Epoch: 19 Global Step: 108040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:29,266-Speed 10336.42 samples/sec Loss 5.6513 LearningRate 0.0276 Epoch: 19 Global Step: 108050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:30,277-Speed 10135.19 samples/sec Loss 5.8263 LearningRate 0.0276 Epoch: 19 Global Step: 108060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:31,417-Speed 8981.86 samples/sec Loss 5.7018 LearningRate 0.0275 Epoch: 19 Global Step: 108070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:32,399-Speed 10436.55 samples/sec Loss 5.7450 LearningRate 0.0275 Epoch: 19 Global Step: 108080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:33,362-Speed 10643.54 samples/sec Loss 5.6198 LearningRate 0.0275 Epoch: 19 Global Step: 108090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:34,321-Speed 10686.65 samples/sec Loss 5.8185 LearningRate 0.0275 Epoch: 19 Global Step: 108100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:35,227-Speed 11322.12 samples/sec Loss 5.8716 LearningRate 0.0275 Epoch: 19 Global Step: 108110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:36,147-Speed 11140.59 samples/sec Loss 5.8106 LearningRate 0.0275 Epoch: 19 Global Step: 108120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:37,184-Speed 9884.64 samples/sec Loss 5.9277 LearningRate 0.0275 Epoch: 19 Global Step: 108130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:38,104-Speed 11142.16 samples/sec Loss 5.7524 LearningRate 0.0275 Epoch: 19 Global Step: 108140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:39,066-Speed 10662.36 samples/sec Loss 5.9240 LearningRate 0.0275 Epoch: 19 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:39,986-Speed 11135.32 samples/sec Loss 5.9747 LearningRate 0.0275 Epoch: 19 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:40,932-Speed 10839.04 samples/sec Loss 5.8179 LearningRate 0.0275 Epoch: 19 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:41,872-Speed 10896.15 samples/sec Loss 5.7861 LearningRate 0.0275 Epoch: 19 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:42,766-Speed 11466.98 samples/sec Loss 5.7390 LearningRate 0.0275 Epoch: 19 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:43,683-Speed 11174.37 samples/sec Loss 5.9532 LearningRate 0.0275 Epoch: 19 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:44,591-Speed 11277.33 samples/sec Loss 5.9416 LearningRate 0.0275 Epoch: 19 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:45,476-Speed 11584.42 samples/sec Loss 5.8181 LearningRate 0.0275 Epoch: 19 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:46,391-Speed 11199.95 samples/sec Loss 5.7969 LearningRate 0.0275 Epoch: 19 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:47,290-Speed 11393.79 samples/sec Loss 5.8355 LearningRate 0.0275 Epoch: 19 Global Step: 108240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:48,207-Speed 11173.22 samples/sec Loss 5.9775 LearningRate 0.0275 Epoch: 19 Global Step: 108250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:49,128-Speed 11128.05 samples/sec Loss 5.9012 LearningRate 0.0275 Epoch: 19 Global Step: 108260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:50,071-Speed 10873.74 samples/sec Loss 5.9108 LearningRate 0.0275 Epoch: 19 Global Step: 108270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:51,027-Speed 10714.73 samples/sec Loss 5.8918 LearningRate 0.0274 Epoch: 19 Global Step: 108280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:51,961-Speed 10973.26 samples/sec Loss 5.9550 LearningRate 0.0274 Epoch: 19 Global Step: 108290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:52,929-Speed 10581.77 samples/sec Loss 5.9276 LearningRate 0.0274 Epoch: 19 Global Step: 108300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:53,902-Speed 10541.26 samples/sec Loss 5.8565 LearningRate 0.0274 Epoch: 19 Global Step: 108310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:54,826-Speed 11089.11 samples/sec Loss 6.0356 LearningRate 0.0274 Epoch: 19 Global Step: 108320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:05:55,773-Speed 10813.57 samples/sec Loss 5.9849 LearningRate 0.0274 Epoch: 19 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:56,669-Speed 11434.97 samples/sec Loss 5.7441 LearningRate 0.0274 Epoch: 19 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:57,560-Speed 11508.99 samples/sec Loss 5.8728 LearningRate 0.0274 Epoch: 19 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:58,447-Speed 11552.29 samples/sec Loss 5.9416 LearningRate 0.0274 Epoch: 19 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:05:59,320-Speed 11729.95 samples/sec Loss 5.8359 LearningRate 0.0274 Epoch: 19 Global Step: 108370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:00,238-Speed 11170.80 samples/sec Loss 5.9085 LearningRate 0.0274 Epoch: 19 Global Step: 108380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:01,182-Speed 10849.83 samples/sec Loss 5.8870 LearningRate 0.0274 Epoch: 19 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:02,110-Speed 11039.98 samples/sec Loss 6.0875 LearningRate 0.0274 Epoch: 19 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:03,048-Speed 10926.73 samples/sec Loss 5.9302 LearningRate 0.0274 Epoch: 19 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:04,013-Speed 10620.48 samples/sec Loss 6.0500 LearningRate 0.0274 Epoch: 19 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:05,003-Speed 10350.54 samples/sec Loss 5.9097 LearningRate 0.0274 Epoch: 19 Global Step: 108430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:05,913-Speed 11263.78 samples/sec Loss 5.9103 LearningRate 0.0274 Epoch: 19 Global Step: 108440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:06,825-Speed 11237.71 samples/sec Loss 5.9017 LearningRate 0.0274 Epoch: 19 Global Step: 108450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:07,746-Speed 11120.39 samples/sec Loss 5.7966 LearningRate 0.0274 Epoch: 19 Global Step: 108460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:08,687-Speed 10889.84 samples/sec Loss 5.9502 LearningRate 0.0274 Epoch: 19 Global Step: 108470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:09,583-Speed 11437.42 samples/sec Loss 5.9994 LearningRate 0.0274 Epoch: 19 Global Step: 108480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:10,489-Speed 11315.49 samples/sec Loss 6.0454 LearningRate 0.0274 Epoch: 19 Global Step: 108490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:11,371-Speed 11611.97 samples/sec Loss 6.0450 LearningRate 0.0273 Epoch: 19 Global Step: 108500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:12,289-Speed 11167.93 samples/sec Loss 6.0448 LearningRate 0.0273 Epoch: 19 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:13,251-Speed 10650.78 samples/sec Loss 6.0241 LearningRate 0.0273 Epoch: 19 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:14,171-Speed 11138.53 samples/sec Loss 6.0063 LearningRate 0.0273 Epoch: 19 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:15,122-Speed 10769.45 samples/sec Loss 5.9833 LearningRate 0.0273 Epoch: 19 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:16,083-Speed 10665.49 samples/sec Loss 5.8809 LearningRate 0.0273 Epoch: 19 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:17,034-Speed 10780.42 samples/sec Loss 5.9443 LearningRate 0.0273 Epoch: 19 Global Step: 108560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:18,001-Speed 10593.58 samples/sec Loss 5.9108 LearningRate 0.0273 Epoch: 19 Global Step: 108570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:18,936-Speed 10958.18 samples/sec Loss 5.8964 LearningRate 0.0273 Epoch: 19 Global Step: 108580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:19,906-Speed 10567.01 samples/sec Loss 5.9727 LearningRate 0.0273 Epoch: 19 Global Step: 108590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:20,820-Speed 11225.33 samples/sec Loss 5.9807 LearningRate 0.0273 Epoch: 19 Global Step: 108600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:21,709-Speed 11533.36 samples/sec Loss 5.9506 LearningRate 0.0273 Epoch: 19 Global Step: 108610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:22,611-Speed 11366.95 samples/sec Loss 5.9439 LearningRate 0.0273 Epoch: 19 Global Step: 108620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:23,490-Speed 11655.52 samples/sec Loss 6.0762 LearningRate 0.0273 Epoch: 19 Global Step: 108630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:24,381-Speed 11505.49 samples/sec Loss 6.0914 LearningRate 0.0273 Epoch: 19 Global Step: 108640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:25,279-Speed 11415.44 samples/sec Loss 5.9667 LearningRate 0.0273 Epoch: 19 Global Step: 108650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:26,176-Speed 11416.94 samples/sec Loss 6.1668 LearningRate 0.0273 Epoch: 19 Global Step: 108660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:27,124-Speed 10813.04 samples/sec Loss 6.1309 LearningRate 0.0273 Epoch: 19 Global Step: 108670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:28,068-Speed 10859.19 samples/sec Loss 6.1514 LearningRate 0.0273 Epoch: 19 Global Step: 108680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:28,992-Speed 11083.01 samples/sec Loss 6.0415 LearningRate 0.0273 Epoch: 19 Global Step: 108690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:29,981-Speed 10359.30 samples/sec Loss 6.2022 LearningRate 0.0273 Epoch: 19 Global Step: 108700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:30,892-Speed 11253.68 samples/sec Loss 6.0354 LearningRate 0.0273 Epoch: 19 Global Step: 108710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:31,831-Speed 10912.72 samples/sec Loss 6.0479 LearningRate 0.0272 Epoch: 19 Global Step: 108720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:32,737-Speed 11306.87 samples/sec Loss 5.9242 LearningRate 0.0272 Epoch: 19 Global Step: 108730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:33,718-Speed 10452.59 samples/sec Loss 6.0526 LearningRate 0.0272 Epoch: 19 Global Step: 108740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:34,614-Speed 11434.44 samples/sec Loss 6.1595 LearningRate 0.0272 Epoch: 19 Global Step: 108750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:35,501-Speed 11558.43 samples/sec Loss 6.0532 LearningRate 0.0272 Epoch: 19 Global Step: 108760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:36,421-Speed 11135.66 samples/sec Loss 5.9655 LearningRate 0.0272 Epoch: 19 Global Step: 108770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:37,346-Speed 11075.63 samples/sec Loss 6.1612 LearningRate 0.0272 Epoch: 19 Global Step: 108780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:38,293-Speed 10819.36 samples/sec Loss 6.2413 LearningRate 0.0272 Epoch: 19 Global Step: 108790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:39,258-Speed 10617.64 samples/sec Loss 5.8580 LearningRate 0.0272 Epoch: 19 Global Step: 108800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:40,148-Speed 11512.34 samples/sec Loss 6.0724 LearningRate 0.0272 Epoch: 19 Global Step: 108810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:41,100-Speed 10772.85 samples/sec Loss 6.0597 LearningRate 0.0272 Epoch: 19 Global Step: 108820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:42,006-Speed 11305.03 samples/sec Loss 6.0989 LearningRate 0.0272 Epoch: 19 Global Step: 108830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:42,904-Speed 11407.29 samples/sec Loss 6.1053 LearningRate 0.0272 Epoch: 19 Global Step: 108840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:43,819-Speed 11206.38 samples/sec Loss 6.0506 LearningRate 0.0272 Epoch: 19 Global Step: 108850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:44,721-Speed 11349.64 samples/sec Loss 5.9881 LearningRate 0.0272 Epoch: 19 Global Step: 108860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:45,609-Speed 11548.13 samples/sec Loss 5.9966 LearningRate 0.0272 Epoch: 19 Global Step: 108870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:46,499-Speed 11505.36 samples/sec Loss 6.1110 LearningRate 0.0272 Epoch: 19 Global Step: 108880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:47,387-Speed 11546.73 samples/sec Loss 6.1441 LearningRate 0.0272 Epoch: 19 Global Step: 108890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:48,323-Speed 10941.54 samples/sec Loss 6.1626 LearningRate 0.0272 Epoch: 19 Global Step: 108900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:49,279-Speed 10717.25 samples/sec Loss 6.2304 LearningRate 0.0272 Epoch: 19 Global Step: 108910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:50,197-Speed 11167.98 samples/sec Loss 6.0965 LearningRate 0.0272 Epoch: 19 Global Step: 108920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:51,138-Speed 10887.81 samples/sec Loss 6.1317 LearningRate 0.0272 Epoch: 19 Global Step: 108930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:52,077-Speed 10914.28 samples/sec Loss 6.1649 LearningRate 0.0271 Epoch: 19 Global Step: 108940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:52,978-Speed 11373.68 samples/sec Loss 6.1937 LearningRate 0.0271 Epoch: 19 Global Step: 108950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:53,863-Speed 11587.16 samples/sec Loss 6.2439 LearningRate 0.0271 Epoch: 19 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:54,771-Speed 11285.85 samples/sec Loss 6.1455 LearningRate 0.0271 Epoch: 19 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:55,710-Speed 10903.26 samples/sec Loss 6.1780 LearningRate 0.0271 Epoch: 19 Global Step: 108980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:56,656-Speed 10832.29 samples/sec Loss 6.0782 LearningRate 0.0271 Epoch: 19 Global Step: 108990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:06:57,550-Speed 11463.41 samples/sec Loss 6.1315 LearningRate 0.0271 Epoch: 19 Global Step: 109000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:58,475-Speed 11074.87 samples/sec Loss 6.0502 LearningRate 0.0271 Epoch: 19 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:06:59,386-Speed 11254.00 samples/sec Loss 6.2371 LearningRate 0.0271 Epoch: 19 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:00,306-Speed 11130.71 samples/sec Loss 6.2096 LearningRate 0.0271 Epoch: 19 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:01,251-Speed 10846.18 samples/sec Loss 6.0238 LearningRate 0.0271 Epoch: 19 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:02,134-Speed 11613.12 samples/sec Loss 6.1003 LearningRate 0.0271 Epoch: 19 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:03,054-Speed 11134.27 samples/sec Loss 6.0205 LearningRate 0.0271 Epoch: 19 Global Step: 109060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:03,970-Speed 11189.45 samples/sec Loss 6.1741 LearningRate 0.0271 Epoch: 19 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:04,909-Speed 10912.07 samples/sec Loss 6.2419 LearningRate 0.0271 Epoch: 19 Global Step: 109080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:05,829-Speed 11136.66 samples/sec Loss 6.0857 LearningRate 0.0271 Epoch: 19 Global Step: 109090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:06,771-Speed 10874.04 samples/sec Loss 6.2398 LearningRate 0.0271 Epoch: 19 Global Step: 109100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:07,711-Speed 10909.44 samples/sec Loss 6.1247 LearningRate 0.0271 Epoch: 19 Global Step: 109110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:08,618-Speed 11296.78 samples/sec Loss 6.3423 LearningRate 0.0271 Epoch: 19 Global Step: 109120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:09,517-Speed 11398.64 samples/sec Loss 6.2400 LearningRate 0.0271 Epoch: 19 Global Step: 109130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:10,391-Speed 11718.10 samples/sec Loss 6.2351 LearningRate 0.0271 Epoch: 19 Global Step: 109140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:11,303-Speed 11233.17 samples/sec Loss 6.1654 LearningRate 0.0270 Epoch: 19 Global Step: 109150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:12,242-Speed 10934.09 samples/sec Loss 6.1672 LearningRate 0.0270 Epoch: 19 Global Step: 109160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:13,147-Speed 11324.50 samples/sec Loss 6.1769 LearningRate 0.0270 Epoch: 19 Global Step: 109170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:14,038-Speed 11504.33 samples/sec Loss 6.2177 LearningRate 0.0270 Epoch: 19 Global Step: 109180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:14,931-Speed 11471.28 samples/sec Loss 6.1647 LearningRate 0.0270 Epoch: 19 Global Step: 109190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:15,840-Speed 11275.77 samples/sec Loss 6.0368 LearningRate 0.0270 Epoch: 19 Global Step: 109200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:16,771-Speed 11007.14 samples/sec Loss 6.1128 LearningRate 0.0270 Epoch: 19 Global Step: 109210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:17,719-Speed 10812.80 samples/sec Loss 6.0756 LearningRate 0.0270 Epoch: 19 Global Step: 109220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:18,639-Speed 11139.37 samples/sec Loss 6.1404 LearningRate 0.0270 Epoch: 19 Global Step: 109230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:19,542-Speed 11350.09 samples/sec Loss 6.0927 LearningRate 0.0270 Epoch: 19 Global Step: 109240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:20,480-Speed 10923.20 samples/sec Loss 6.2574 LearningRate 0.0270 Epoch: 19 Global Step: 109250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:21,379-Speed 11392.73 samples/sec Loss 6.2135 LearningRate 0.0270 Epoch: 19 Global Step: 109260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:22,311-Speed 10993.19 samples/sec Loss 6.1699 LearningRate 0.0270 Epoch: 19 Global Step: 109270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:23,221-Speed 11270.47 samples/sec Loss 6.2899 LearningRate 0.0270 Epoch: 19 Global Step: 109280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:24,173-Speed 10764.29 samples/sec Loss 6.1592 LearningRate 0.0270 Epoch: 19 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:25,081-Speed 11283.10 samples/sec Loss 6.1164 LearningRate 0.0270 Epoch: 19 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:26,009-Speed 11041.54 samples/sec Loss 6.1265 LearningRate 0.0270 Epoch: 19 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:26,944-Speed 10957.80 samples/sec Loss 6.2520 LearningRate 0.0270 Epoch: 19 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:27,869-Speed 11087.55 samples/sec Loss 6.0812 LearningRate 0.0270 Epoch: 19 Global Step: 109330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:28,823-Speed 10738.37 samples/sec Loss 6.2773 LearningRate 0.0270 Epoch: 19 Global Step: 109340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:29,757-Speed 10970.80 samples/sec Loss 6.3584 LearningRate 0.0270 Epoch: 19 Global Step: 109350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:30,667-Speed 11261.74 samples/sec Loss 6.1819 LearningRate 0.0270 Epoch: 19 Global Step: 109360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:31,638-Speed 10551.43 samples/sec Loss 6.3248 LearningRate 0.0269 Epoch: 19 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:32,553-Speed 11200.57 samples/sec Loss 6.2436 LearningRate 0.0269 Epoch: 19 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:33,447-Speed 11463.57 samples/sec Loss 6.1646 LearningRate 0.0269 Epoch: 19 Global Step: 109390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:34,337-Speed 11511.43 samples/sec Loss 6.2322 LearningRate 0.0269 Epoch: 19 Global Step: 109400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:35,220-Speed 11606.40 samples/sec Loss 6.2562 LearningRate 0.0269 Epoch: 19 Global Step: 109410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:36,116-Speed 11435.88 samples/sec Loss 6.3200 LearningRate 0.0269 Epoch: 19 Global Step: 109420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:37,024-Speed 11279.22 samples/sec Loss 6.1556 LearningRate 0.0269 Epoch: 19 Global Step: 109430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:37,964-Speed 10904.49 samples/sec Loss 6.2941 LearningRate 0.0269 Epoch: 19 Global Step: 109440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:38,935-Speed 10552.22 samples/sec Loss 6.3011 LearningRate 0.0269 Epoch: 19 Global Step: 109450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:39,881-Speed 10832.68 samples/sec Loss 6.1635 LearningRate 0.0269 Epoch: 19 Global Step: 109460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:40,763-Speed 11621.55 samples/sec Loss 6.0858 LearningRate 0.0269 Epoch: 19 Global Step: 109470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:41,689-Speed 11063.50 samples/sec Loss 6.2828 LearningRate 0.0269 Epoch: 19 Global Step: 109480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:42,646-Speed 10702.56 samples/sec Loss 6.0366 LearningRate 0.0269 Epoch: 19 Global Step: 109490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:43,578-Speed 11000.58 samples/sec Loss 6.1572 LearningRate 0.0269 Epoch: 19 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:44,506-Speed 11048.65 samples/sec Loss 6.1791 LearningRate 0.0269 Epoch: 19 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:45,387-Speed 11627.16 samples/sec Loss 6.1493 LearningRate 0.0269 Epoch: 19 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:46,273-Speed 11568.44 samples/sec Loss 6.2122 LearningRate 0.0269 Epoch: 19 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:47,178-Speed 11315.59 samples/sec Loss 6.2100 LearningRate 0.0269 Epoch: 19 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:48,097-Speed 11155.93 samples/sec Loss 6.1449 LearningRate 0.0269 Epoch: 19 Global Step: 109550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:49,018-Speed 11124.41 samples/sec Loss 6.1906 LearningRate 0.0269 Epoch: 19 Global Step: 109560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:07:49,942-Speed 11084.68 samples/sec Loss 6.2579 LearningRate 0.0269 Epoch: 19 Global Step: 109570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:50,862-Speed 11145.01 samples/sec Loss 6.3843 LearningRate 0.0269 Epoch: 19 Global Step: 109580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:51,777-Speed 11195.72 samples/sec Loss 6.3124 LearningRate 0.0268 Epoch: 19 Global Step: 109590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:52,725-Speed 10812.48 samples/sec Loss 6.2186 LearningRate 0.0268 Epoch: 19 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:53,638-Speed 11212.00 samples/sec Loss 6.1285 LearningRate 0.0268 Epoch: 19 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:54,597-Speed 10700.69 samples/sec Loss 6.2218 LearningRate 0.0268 Epoch: 19 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:55,486-Speed 11524.72 samples/sec Loss 6.1702 LearningRate 0.0268 Epoch: 19 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:56,395-Speed 11277.35 samples/sec Loss 6.3005 LearningRate 0.0268 Epoch: 19 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:57,275-Speed 11649.58 samples/sec Loss 6.2579 LearningRate 0.0268 Epoch: 19 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:58,169-Speed 11457.48 samples/sec Loss 6.2634 LearningRate 0.0268 Epoch: 19 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:07:59,063-Speed 11461.96 samples/sec Loss 6.1290 LearningRate 0.0268 Epoch: 19 Global Step: 109670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:07:59,961-Speed 11413.40 samples/sec Loss 6.3434 LearningRate 0.0268 Epoch: 19 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:00,855-Speed 11463.17 samples/sec Loss 6.2232 LearningRate 0.0268 Epoch: 19 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:01,769-Speed 11209.43 samples/sec Loss 6.1923 LearningRate 0.0268 Epoch: 19 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:02,702-Speed 10991.53 samples/sec Loss 6.2497 LearningRate 0.0268 Epoch: 19 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:03,717-Speed 10087.65 samples/sec Loss 6.1715 LearningRate 0.0268 Epoch: 19 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:04,663-Speed 10833.24 samples/sec Loss 6.3373 LearningRate 0.0268 Epoch: 19 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:05,596-Speed 10979.40 samples/sec Loss 6.3298 LearningRate 0.0268 Epoch: 19 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:06,524-Speed 11050.34 samples/sec Loss 6.2230 LearningRate 0.0268 Epoch: 19 Global Step: 109750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:07,458-Speed 10970.69 samples/sec Loss 6.2329 LearningRate 0.0268 Epoch: 19 Global Step: 109760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:08,380-Speed 11126.66 samples/sec Loss 6.2821 LearningRate 0.0268 Epoch: 19 Global Step: 109770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:09,269-Speed 11515.89 samples/sec Loss 6.2559 LearningRate 0.0268 Epoch: 19 Global Step: 109780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:10,155-Speed 11571.47 samples/sec Loss 6.1779 LearningRate 0.0268 Epoch: 19 Global Step: 109790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:11,043-Speed 11531.98 samples/sec Loss 6.2810 LearningRate 0.0268 Epoch: 19 Global Step: 109800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:11,982-Speed 10916.76 samples/sec Loss 6.2060 LearningRate 0.0267 Epoch: 19 Global Step: 109810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:12,923-Speed 10884.33 samples/sec Loss 6.1861 LearningRate 0.0267 Epoch: 19 Global Step: 109820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:13,840-Speed 11184.06 samples/sec Loss 6.2477 LearningRate 0.0267 Epoch: 19 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:14,772-Speed 10988.60 samples/sec Loss 6.3024 LearningRate 0.0267 Epoch: 19 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:15,739-Speed 10599.79 samples/sec Loss 6.3137 LearningRate 0.0267 Epoch: 19 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:16,712-Speed 10533.82 samples/sec Loss 6.2490 LearningRate 0.0267 Epoch: 19 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:17,658-Speed 10826.38 samples/sec Loss 6.1483 LearningRate 0.0267 Epoch: 19 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:18,623-Speed 10615.24 samples/sec Loss 6.2419 LearningRate 0.0267 Epoch: 19 Global Step: 109880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:08:19,528-Speed 11322.16 samples/sec Loss 6.2203 LearningRate 0.0267 Epoch: 19 Global Step: 109890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:08:20,416-Speed 11552.48 samples/sec Loss 6.3924 LearningRate 0.0267 Epoch: 19 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:21,315-Speed 11399.96 samples/sec Loss 6.2433 LearningRate 0.0267 Epoch: 19 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:22,211-Speed 11438.53 samples/sec Loss 6.2799 LearningRate 0.0267 Epoch: 19 Global Step: 109920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:23,090-Speed 11645.91 samples/sec Loss 6.3450 LearningRate 0.0267 Epoch: 19 Global Step: 109930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:23,985-Speed 11448.69 samples/sec Loss 6.2436 LearningRate 0.0267 Epoch: 19 Global Step: 109940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:24,895-Speed 11261.28 samples/sec Loss 6.3109 LearningRate 0.0267 Epoch: 19 Global Step: 109950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:25,797-Speed 11365.31 samples/sec Loss 6.3138 LearningRate 0.0267 Epoch: 19 Global Step: 109960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:26,723-Speed 11069.43 samples/sec Loss 6.2848 LearningRate 0.0267 Epoch: 19 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:27,625-Speed 11353.61 samples/sec Loss 6.2860 LearningRate 0.0267 Epoch: 19 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:28,549-Speed 11088.37 samples/sec Loss 6.1927 LearningRate 0.0267 Epoch: 19 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:08:29,494-Speed 10852.27 samples/sec Loss 6.3503 LearningRate 0.0267 Epoch: 19 Global Step: 110000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:08:50,765-[lfw][110000]XNorm: 10.585018 Training: 2022-04-11 03:08:50,766-[lfw][110000]Accuracy-Flip: 0.99583+-0.00261 Training: 2022-04-11 03:08:50,766-[lfw][110000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:09:15,578-[cfp_fp][110000]XNorm: 8.887376 Training: 2022-04-11 03:09:15,578-[cfp_fp][110000]Accuracy-Flip: 0.94286+-0.00826 Training: 2022-04-11 03:09:15,578-[cfp_fp][110000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:09:36,765-[agedb_30][110000]XNorm: 10.348750 Training: 2022-04-11 03:09:36,765-[agedb_30][110000]Accuracy-Flip: 0.96117+-0.01062 Training: 2022-04-11 03:09:36,766-[agedb_30][110000]Accuracy-Highest: 0.96483 Training: 2022-04-11 03:09:37,668-Speed 150.20 samples/sec Loss 6.2713 LearningRate 0.0267 Epoch: 19 Global Step: 110010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:09:38,621-Speed 10763.12 samples/sec Loss 6.0369 LearningRate 0.0267 Epoch: 19 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:39,530-Speed 11269.53 samples/sec Loss 6.2619 LearningRate 0.0266 Epoch: 19 Global Step: 110030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:40,454-Speed 11092.31 samples/sec Loss 6.2700 LearningRate 0.0266 Epoch: 19 Global Step: 110040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:41,371-Speed 11166.89 samples/sec Loss 6.3432 LearningRate 0.0266 Epoch: 19 Global Step: 110050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:42,280-Speed 11271.93 samples/sec Loss 6.3041 LearningRate 0.0266 Epoch: 19 Global Step: 110060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:43,195-Speed 11210.28 samples/sec Loss 6.1773 LearningRate 0.0266 Epoch: 19 Global Step: 110070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:44,104-Speed 11265.36 samples/sec Loss 6.3335 LearningRate 0.0266 Epoch: 19 Global Step: 110080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:45,020-Speed 11183.58 samples/sec Loss 6.3768 LearningRate 0.0266 Epoch: 19 Global Step: 110090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:45,911-Speed 11502.97 samples/sec Loss 6.3427 LearningRate 0.0266 Epoch: 19 Global Step: 110100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:46,815-Speed 11337.32 samples/sec Loss 6.3190 LearningRate 0.0266 Epoch: 19 Global Step: 110110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:47,709-Speed 11462.05 samples/sec Loss 6.3379 LearningRate 0.0266 Epoch: 19 Global Step: 110120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:09:48,621-Speed 11233.52 samples/sec Loss 6.3338 LearningRate 0.0266 Epoch: 19 Global Step: 110130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:09:49,503-Speed 11618.96 samples/sec Loss 6.3266 LearningRate 0.0266 Epoch: 19 Global Step: 110140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:50,411-Speed 11283.87 samples/sec Loss 6.2233 LearningRate 0.0266 Epoch: 19 Global Step: 110150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:51,303-Speed 11489.77 samples/sec Loss 6.3762 LearningRate 0.0266 Epoch: 19 Global Step: 110160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:52,203-Speed 11388.24 samples/sec Loss 6.2996 LearningRate 0.0266 Epoch: 19 Global Step: 110170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:53,171-Speed 10578.50 samples/sec Loss 6.3620 LearningRate 0.0266 Epoch: 19 Global Step: 110180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:54,091-Speed 11139.45 samples/sec Loss 6.1340 LearningRate 0.0266 Epoch: 19 Global Step: 110190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:55,011-Speed 11142.72 samples/sec Loss 6.2190 LearningRate 0.0266 Epoch: 19 Global Step: 110200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:55,996-Speed 10402.41 samples/sec Loss 6.4840 LearningRate 0.0266 Epoch: 19 Global Step: 110210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:56,920-Speed 11083.43 samples/sec Loss 6.3043 LearningRate 0.0266 Epoch: 19 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:57,815-Speed 11451.61 samples/sec Loss 6.2757 LearningRate 0.0266 Epoch: 19 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:09:58,712-Speed 11425.62 samples/sec Loss 6.2445 LearningRate 0.0266 Epoch: 19 Global Step: 110240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:09:59,608-Speed 11435.60 samples/sec Loss 6.3647 LearningRate 0.0265 Epoch: 19 Global Step: 110250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:00,490-Speed 11623.21 samples/sec Loss 6.3602 LearningRate 0.0265 Epoch: 19 Global Step: 110260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:01,402-Speed 11241.38 samples/sec Loss 6.1628 LearningRate 0.0265 Epoch: 19 Global Step: 110270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:02,314-Speed 11246.74 samples/sec Loss 6.2341 LearningRate 0.0265 Epoch: 19 Global Step: 110280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:03,246-Speed 10991.23 samples/sec Loss 6.2202 LearningRate 0.0265 Epoch: 19 Global Step: 110290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:04,138-Speed 11492.14 samples/sec Loss 6.4725 LearningRate 0.0265 Epoch: 19 Global Step: 110300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:05,089-Speed 10772.75 samples/sec Loss 6.2171 LearningRate 0.0265 Epoch: 19 Global Step: 110310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:05,997-Speed 11283.58 samples/sec Loss 6.1656 LearningRate 0.0265 Epoch: 19 Global Step: 110320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:06,926-Speed 11031.88 samples/sec Loss 6.2634 LearningRate 0.0265 Epoch: 19 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:07,812-Speed 11559.19 samples/sec Loss 6.3348 LearningRate 0.0265 Epoch: 19 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:08,723-Speed 11252.34 samples/sec Loss 6.4672 LearningRate 0.0265 Epoch: 19 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:09,644-Speed 11131.49 samples/sec Loss 6.3221 LearningRate 0.0265 Epoch: 19 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:10,526-Speed 11614.42 samples/sec Loss 6.4325 LearningRate 0.0265 Epoch: 19 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:11,437-Speed 11241.64 samples/sec Loss 6.3006 LearningRate 0.0265 Epoch: 19 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:12,360-Speed 11103.86 samples/sec Loss 6.4211 LearningRate 0.0265 Epoch: 19 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:13,267-Speed 11295.85 samples/sec Loss 6.2045 LearningRate 0.0265 Epoch: 19 Global Step: 110400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:14,244-Speed 10487.28 samples/sec Loss 6.3637 LearningRate 0.0265 Epoch: 19 Global Step: 110410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:15,156-Speed 11243.63 samples/sec Loss 6.2190 LearningRate 0.0265 Epoch: 19 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:16,084-Speed 11041.65 samples/sec Loss 6.3387 LearningRate 0.0265 Epoch: 19 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:17,005-Speed 11121.88 samples/sec Loss 6.4685 LearningRate 0.0265 Epoch: 19 Global Step: 110440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:17,958-Speed 10755.83 samples/sec Loss 6.2429 LearningRate 0.0265 Epoch: 19 Global Step: 110450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:18,899-Speed 10888.67 samples/sec Loss 6.2783 LearningRate 0.0265 Epoch: 19 Global Step: 110460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:19,870-Speed 10550.24 samples/sec Loss 6.2385 LearningRate 0.0264 Epoch: 19 Global Step: 110470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:20,777-Speed 11302.55 samples/sec Loss 6.1412 LearningRate 0.0264 Epoch: 19 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:21,684-Speed 11302.43 samples/sec Loss 6.3153 LearningRate 0.0264 Epoch: 19 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:22,575-Speed 11495.49 samples/sec Loss 6.1010 LearningRate 0.0264 Epoch: 19 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:23,465-Speed 11519.60 samples/sec Loss 6.3873 LearningRate 0.0264 Epoch: 19 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:24,359-Speed 11470.78 samples/sec Loss 6.2847 LearningRate 0.0264 Epoch: 19 Global Step: 110520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:25,310-Speed 10773.54 samples/sec Loss 6.4786 LearningRate 0.0264 Epoch: 19 Global Step: 110530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:26,218-Speed 11283.70 samples/sec Loss 6.3202 LearningRate 0.0264 Epoch: 19 Global Step: 110540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:27,173-Speed 10727.95 samples/sec Loss 6.4287 LearningRate 0.0264 Epoch: 19 Global Step: 110550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:28,113-Speed 10909.87 samples/sec Loss 6.2031 LearningRate 0.0264 Epoch: 19 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:29,015-Speed 11370.90 samples/sec Loss 6.3964 LearningRate 0.0264 Epoch: 19 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:29,916-Speed 11369.93 samples/sec Loss 6.4446 LearningRate 0.0264 Epoch: 19 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:30,802-Speed 11565.76 samples/sec Loss 6.3832 LearningRate 0.0264 Epoch: 19 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:31,724-Speed 11107.39 samples/sec Loss 6.3496 LearningRate 0.0264 Epoch: 19 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:32,652-Speed 11043.95 samples/sec Loss 6.3638 LearningRate 0.0264 Epoch: 19 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:33,543-Speed 11503.25 samples/sec Loss 6.4743 LearningRate 0.0264 Epoch: 19 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:34,435-Speed 11490.57 samples/sec Loss 6.4038 LearningRate 0.0264 Epoch: 19 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:35,339-Speed 11325.29 samples/sec Loss 6.2391 LearningRate 0.0264 Epoch: 19 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:36,219-Speed 11650.65 samples/sec Loss 6.4131 LearningRate 0.0264 Epoch: 19 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:37,127-Speed 11291.96 samples/sec Loss 6.4009 LearningRate 0.0264 Epoch: 19 Global Step: 110660 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:38,017-Speed 11510.21 samples/sec Loss 6.2832 LearningRate 0.0264 Epoch: 19 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:38,896-Speed 11659.60 samples/sec Loss 6.1983 LearningRate 0.0264 Epoch: 19 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:39,834-Speed 10919.43 samples/sec Loss 6.2728 LearningRate 0.0263 Epoch: 19 Global Step: 110690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:40,749-Speed 11194.07 samples/sec Loss 6.3870 LearningRate 0.0263 Epoch: 19 Global Step: 110700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:41,674-Speed 11079.52 samples/sec Loss 6.2244 LearningRate 0.0263 Epoch: 19 Global Step: 110710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:42,585-Speed 11252.59 samples/sec Loss 6.4942 LearningRate 0.0263 Epoch: 19 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:43,526-Speed 10896.39 samples/sec Loss 6.2061 LearningRate 0.0263 Epoch: 19 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:44,455-Speed 11025.32 samples/sec Loss 6.3899 LearningRate 0.0263 Epoch: 19 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:45,332-Speed 11680.71 samples/sec Loss 6.3267 LearningRate 0.0263 Epoch: 19 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:46,229-Speed 11426.14 samples/sec Loss 6.3771 LearningRate 0.0263 Epoch: 19 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:47,125-Speed 11435.64 samples/sec Loss 6.3081 LearningRate 0.0263 Epoch: 19 Global Step: 110770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:48,043-Speed 11165.46 samples/sec Loss 6.3917 LearningRate 0.0263 Epoch: 19 Global Step: 110780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:48,976-Speed 10986.46 samples/sec Loss 6.3460 LearningRate 0.0263 Epoch: 19 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:49,906-Speed 11012.91 samples/sec Loss 6.3056 LearningRate 0.0263 Epoch: 19 Global Step: 110800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:50,818-Speed 11241.36 samples/sec Loss 6.3810 LearningRate 0.0263 Epoch: 19 Global Step: 110810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:51,762-Speed 10850.31 samples/sec Loss 6.3050 LearningRate 0.0263 Epoch: 19 Global Step: 110820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:52,719-Speed 10707.40 samples/sec Loss 6.3156 LearningRate 0.0263 Epoch: 19 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:53,668-Speed 10795.73 samples/sec Loss 6.4194 LearningRate 0.0263 Epoch: 19 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:54,605-Speed 10937.98 samples/sec Loss 6.3728 LearningRate 0.0263 Epoch: 19 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:55,552-Speed 10823.85 samples/sec Loss 6.4511 LearningRate 0.0263 Epoch: 19 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:56,444-Speed 11481.65 samples/sec Loss 6.3126 LearningRate 0.0263 Epoch: 19 Global Step: 110870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:57,354-Speed 11259.13 samples/sec Loss 6.2994 LearningRate 0.0263 Epoch: 19 Global Step: 110880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:10:58,255-Speed 11374.86 samples/sec Loss 6.4280 LearningRate 0.0263 Epoch: 19 Global Step: 110890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:10:59,163-Speed 11280.20 samples/sec Loss 6.5314 LearningRate 0.0263 Epoch: 19 Global Step: 110900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:00,106-Speed 10867.66 samples/sec Loss 6.3113 LearningRate 0.0263 Epoch: 19 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:01,006-Speed 11391.89 samples/sec Loss 6.4903 LearningRate 0.0262 Epoch: 19 Global Step: 110920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:01,929-Speed 11098.75 samples/sec Loss 6.4045 LearningRate 0.0262 Epoch: 19 Global Step: 110930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:02,856-Speed 11053.65 samples/sec Loss 6.2847 LearningRate 0.0262 Epoch: 19 Global Step: 110940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:03,803-Speed 10818.37 samples/sec Loss 6.2706 LearningRate 0.0262 Epoch: 19 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:04,713-Speed 11259.36 samples/sec Loss 6.1885 LearningRate 0.0262 Epoch: 19 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:05,654-Speed 10895.79 samples/sec Loss 6.3763 LearningRate 0.0262 Epoch: 19 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:06,548-Speed 11456.81 samples/sec Loss 6.3197 LearningRate 0.0262 Epoch: 19 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:07,435-Speed 11557.59 samples/sec Loss 6.3797 LearningRate 0.0262 Epoch: 19 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:08,334-Speed 11401.95 samples/sec Loss 6.4475 LearningRate 0.0262 Epoch: 19 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:09,227-Speed 11475.70 samples/sec Loss 6.3596 LearningRate 0.0262 Epoch: 19 Global Step: 111010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:10,118-Speed 11501.17 samples/sec Loss 6.2236 LearningRate 0.0262 Epoch: 19 Global Step: 111020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:11,000-Speed 11614.13 samples/sec Loss 6.2542 LearningRate 0.0262 Epoch: 19 Global Step: 111030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:11,903-Speed 11351.14 samples/sec Loss 6.3430 LearningRate 0.0262 Epoch: 19 Global Step: 111040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:12,857-Speed 10741.45 samples/sec Loss 6.3387 LearningRate 0.0262 Epoch: 19 Global Step: 111050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:13,760-Speed 11342.81 samples/sec Loss 6.4402 LearningRate 0.0262 Epoch: 19 Global Step: 111060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:14,678-Speed 11161.96 samples/sec Loss 6.3888 LearningRate 0.0262 Epoch: 19 Global Step: 111070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:15,570-Speed 11487.23 samples/sec Loss 6.5009 LearningRate 0.0262 Epoch: 19 Global Step: 111080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:16,491-Speed 11124.99 samples/sec Loss 6.3316 LearningRate 0.0262 Epoch: 19 Global Step: 111090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:17,442-Speed 10774.97 samples/sec Loss 6.3122 LearningRate 0.0262 Epoch: 19 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:18,394-Speed 10766.43 samples/sec Loss 6.3398 LearningRate 0.0262 Epoch: 19 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:19,319-Speed 11086.66 samples/sec Loss 6.3053 LearningRate 0.0262 Epoch: 19 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:20,240-Speed 11122.91 samples/sec Loss 6.2561 LearningRate 0.0262 Epoch: 19 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:21,135-Speed 11456.11 samples/sec Loss 6.2526 LearningRate 0.0261 Epoch: 19 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:22,017-Speed 11615.42 samples/sec Loss 6.3170 LearningRate 0.0261 Epoch: 19 Global Step: 111150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:22,959-Speed 10868.80 samples/sec Loss 6.3543 LearningRate 0.0261 Epoch: 19 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:23,897-Speed 10927.79 samples/sec Loss 6.3490 LearningRate 0.0261 Epoch: 19 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:24,881-Speed 10420.88 samples/sec Loss 6.3214 LearningRate 0.0261 Epoch: 19 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:25,803-Speed 11105.24 samples/sec Loss 6.3696 LearningRate 0.0261 Epoch: 19 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:26,724-Speed 11128.41 samples/sec Loss 6.2769 LearningRate 0.0261 Epoch: 19 Global Step: 111200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:27,678-Speed 10741.73 samples/sec Loss 6.3935 LearningRate 0.0261 Epoch: 19 Global Step: 111210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:28,679-Speed 10233.56 samples/sec Loss 6.3047 LearningRate 0.0261 Epoch: 19 Global Step: 111220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:29,645-Speed 10611.03 samples/sec Loss 6.3419 LearningRate 0.0261 Epoch: 19 Global Step: 111230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:30,591-Speed 10837.93 samples/sec Loss 6.4921 LearningRate 0.0261 Epoch: 19 Global Step: 111240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:31,516-Speed 11073.75 samples/sec Loss 6.2070 LearningRate 0.0261 Epoch: 19 Global Step: 111250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:32,414-Speed 11413.72 samples/sec Loss 6.2801 LearningRate 0.0261 Epoch: 19 Global Step: 111260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:33,291-Speed 11683.75 samples/sec Loss 6.2892 LearningRate 0.0261 Epoch: 19 Global Step: 111270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:34,199-Speed 11285.55 samples/sec Loss 6.4224 LearningRate 0.0261 Epoch: 19 Global Step: 111280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:35,078-Speed 11658.47 samples/sec Loss 6.3448 LearningRate 0.0261 Epoch: 19 Global Step: 111290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:36,021-Speed 10861.69 samples/sec Loss 6.2376 LearningRate 0.0261 Epoch: 19 Global Step: 111300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:36,958-Speed 10935.36 samples/sec Loss 6.3571 LearningRate 0.0261 Epoch: 19 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:37,886-Speed 11046.98 samples/sec Loss 6.4489 LearningRate 0.0261 Epoch: 19 Global Step: 111320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:38,780-Speed 11461.77 samples/sec Loss 6.1600 LearningRate 0.0261 Epoch: 19 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:39,724-Speed 10850.29 samples/sec Loss 6.5315 LearningRate 0.0261 Epoch: 19 Global Step: 111340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:40,656-Speed 10989.90 samples/sec Loss 6.3186 LearningRate 0.0261 Epoch: 19 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:41,638-Speed 10438.68 samples/sec Loss 6.2832 LearningRate 0.0260 Epoch: 19 Global Step: 111360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:42,544-Speed 11306.03 samples/sec Loss 6.4081 LearningRate 0.0260 Epoch: 19 Global Step: 111370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:43,463-Speed 11158.18 samples/sec Loss 6.4215 LearningRate 0.0260 Epoch: 19 Global Step: 111380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:44,385-Speed 11115.44 samples/sec Loss 6.3221 LearningRate 0.0260 Epoch: 19 Global Step: 111390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:45,261-Speed 11702.43 samples/sec Loss 6.3796 LearningRate 0.0260 Epoch: 19 Global Step: 111400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:46,154-Speed 11496.19 samples/sec Loss 6.2309 LearningRate 0.0260 Epoch: 19 Global Step: 111410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:47,046-Speed 11492.36 samples/sec Loss 6.2895 LearningRate 0.0260 Epoch: 19 Global Step: 111420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:47,923-Speed 11686.28 samples/sec Loss 6.3801 LearningRate 0.0260 Epoch: 19 Global Step: 111430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:48,834-Speed 11244.24 samples/sec Loss 6.3779 LearningRate 0.0260 Epoch: 19 Global Step: 111440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:49,758-Speed 11090.64 samples/sec Loss 6.4105 LearningRate 0.0260 Epoch: 19 Global Step: 111450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:50,654-Speed 11433.77 samples/sec Loss 6.2528 LearningRate 0.0260 Epoch: 19 Global Step: 111460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:51,627-Speed 10535.73 samples/sec Loss 6.4332 LearningRate 0.0260 Epoch: 19 Global Step: 111470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:52,541-Speed 11202.77 samples/sec Loss 6.3653 LearningRate 0.0260 Epoch: 19 Global Step: 111480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:53,471-Speed 11022.66 samples/sec Loss 6.2277 LearningRate 0.0260 Epoch: 19 Global Step: 111490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:54,375-Speed 11334.23 samples/sec Loss 6.2085 LearningRate 0.0260 Epoch: 19 Global Step: 111500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:55,278-Speed 11356.25 samples/sec Loss 6.4330 LearningRate 0.0260 Epoch: 19 Global Step: 111510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:56,185-Speed 11306.80 samples/sec Loss 6.3359 LearningRate 0.0260 Epoch: 19 Global Step: 111520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:11:57,065-Speed 11638.46 samples/sec Loss 6.3103 LearningRate 0.0260 Epoch: 19 Global Step: 111530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:57,966-Speed 11369.13 samples/sec Loss 6.2550 LearningRate 0.0260 Epoch: 19 Global Step: 111540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:58,869-Speed 11348.12 samples/sec Loss 6.3077 LearningRate 0.0260 Epoch: 19 Global Step: 111550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:11:59,789-Speed 11144.98 samples/sec Loss 6.4724 LearningRate 0.0260 Epoch: 19 Global Step: 111560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:12:00,665-Speed 11703.69 samples/sec Loss 6.3135 LearningRate 0.0260 Epoch: 19 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:01,557-Speed 11481.62 samples/sec Loss 6.1851 LearningRate 0.0259 Epoch: 19 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:02,463-Speed 11315.04 samples/sec Loss 6.4365 LearningRate 0.0259 Epoch: 19 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:03,390-Speed 11046.78 samples/sec Loss 6.4111 LearningRate 0.0259 Epoch: 19 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:04,281-Speed 11508.39 samples/sec Loss 6.2320 LearningRate 0.0259 Epoch: 19 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:05,216-Speed 10957.75 samples/sec Loss 6.2177 LearningRate 0.0259 Epoch: 19 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:06,149-Speed 10980.65 samples/sec Loss 6.3556 LearningRate 0.0259 Epoch: 19 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:07,064-Speed 11192.15 samples/sec Loss 6.1468 LearningRate 0.0259 Epoch: 19 Global Step: 111640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:07,981-Speed 11181.86 samples/sec Loss 6.4943 LearningRate 0.0259 Epoch: 19 Global Step: 111650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:08,887-Speed 11315.67 samples/sec Loss 6.3614 LearningRate 0.0259 Epoch: 19 Global Step: 111660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:09,770-Speed 11604.15 samples/sec Loss 6.2747 LearningRate 0.0259 Epoch: 19 Global Step: 111670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:12:10,657-Speed 11567.28 samples/sec Loss 6.2760 LearningRate 0.0259 Epoch: 19 Global Step: 111680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:11,579-Speed 11107.99 samples/sec Loss 6.3619 LearningRate 0.0259 Epoch: 19 Global Step: 111690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:12,466-Speed 11549.64 samples/sec Loss 6.4831 LearningRate 0.0259 Epoch: 19 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:13,406-Speed 10901.59 samples/sec Loss 6.2307 LearningRate 0.0259 Epoch: 19 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:14,318-Speed 11239.69 samples/sec Loss 6.3794 LearningRate 0.0259 Epoch: 19 Global Step: 111720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:15,248-Speed 11016.08 samples/sec Loss 6.2525 LearningRate 0.0259 Epoch: 19 Global Step: 111730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:16,159-Speed 11243.50 samples/sec Loss 6.4603 LearningRate 0.0259 Epoch: 19 Global Step: 111740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:17,050-Speed 11505.02 samples/sec Loss 6.3405 LearningRate 0.0259 Epoch: 19 Global Step: 111750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:17,971-Speed 11129.70 samples/sec Loss 6.4404 LearningRate 0.0259 Epoch: 19 Global Step: 111760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:18,917-Speed 10830.07 samples/sec Loss 6.3921 LearningRate 0.0259 Epoch: 19 Global Step: 111770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:19,854-Speed 10939.57 samples/sec Loss 6.2219 LearningRate 0.0259 Epoch: 19 Global Step: 111780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:20,759-Speed 11321.38 samples/sec Loss 6.3382 LearningRate 0.0259 Epoch: 19 Global Step: 111790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:21,644-Speed 11581.59 samples/sec Loss 6.4800 LearningRate 0.0259 Epoch: 19 Global Step: 111800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:22,537-Speed 11478.38 samples/sec Loss 6.2409 LearningRate 0.0258 Epoch: 19 Global Step: 111810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:23,454-Speed 11178.59 samples/sec Loss 6.2874 LearningRate 0.0258 Epoch: 19 Global Step: 111820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:24,355-Speed 11376.08 samples/sec Loss 6.2400 LearningRate 0.0258 Epoch: 19 Global Step: 111830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:25,239-Speed 11584.43 samples/sec Loss 6.5619 LearningRate 0.0258 Epoch: 19 Global Step: 111840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:26,190-Speed 10780.31 samples/sec Loss 6.3054 LearningRate 0.0258 Epoch: 19 Global Step: 111850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:27,106-Speed 11179.67 samples/sec Loss 6.3082 LearningRate 0.0258 Epoch: 19 Global Step: 111860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:28,069-Speed 10645.07 samples/sec Loss 6.3036 LearningRate 0.0258 Epoch: 19 Global Step: 111870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:29,009-Speed 10896.66 samples/sec Loss 6.2962 LearningRate 0.0258 Epoch: 19 Global Step: 111880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:29,900-Speed 11508.22 samples/sec Loss 6.3158 LearningRate 0.0258 Epoch: 19 Global Step: 111890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:12:30,813-Speed 11217.21 samples/sec Loss 6.3666 LearningRate 0.0258 Epoch: 19 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:31,709-Speed 11442.21 samples/sec Loss 6.3731 LearningRate 0.0258 Epoch: 19 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:32,598-Speed 11529.29 samples/sec Loss 6.2644 LearningRate 0.0258 Epoch: 19 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:33,508-Speed 11254.39 samples/sec Loss 6.3274 LearningRate 0.0258 Epoch: 19 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:34,388-Speed 11648.17 samples/sec Loss 6.3026 LearningRate 0.0258 Epoch: 19 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:35,295-Speed 11296.53 samples/sec Loss 6.4305 LearningRate 0.0258 Epoch: 19 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:36,240-Speed 10836.70 samples/sec Loss 6.4649 LearningRate 0.0258 Epoch: 19 Global Step: 111960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:37,175-Speed 10960.84 samples/sec Loss 6.3796 LearningRate 0.0258 Epoch: 19 Global Step: 111970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:38,104-Speed 11039.38 samples/sec Loss 6.3943 LearningRate 0.0258 Epoch: 19 Global Step: 111980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:39,072-Speed 10587.71 samples/sec Loss 6.3765 LearningRate 0.0258 Epoch: 19 Global Step: 111990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:12:40,019-Speed 10818.92 samples/sec Loss 6.4005 LearningRate 0.0258 Epoch: 19 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:01,362-[lfw][112000]XNorm: 10.604202 Training: 2022-04-11 03:13:01,362-[lfw][112000]Accuracy-Flip: 0.99617+-0.00269 Training: 2022-04-11 03:13:01,363-[lfw][112000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:13:25,984-[cfp_fp][112000]XNorm: 8.894303 Training: 2022-04-11 03:13:25,985-[cfp_fp][112000]Accuracy-Flip: 0.94271+-0.00951 Training: 2022-04-11 03:13:25,985-[cfp_fp][112000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:13:47,235-[agedb_30][112000]XNorm: 10.424025 Training: 2022-04-11 03:13:47,235-[agedb_30][112000]Accuracy-Flip: 0.96300+-0.00894 Training: 2022-04-11 03:13:47,236-[agedb_30][112000]Accuracy-Highest: 0.96483 Training: 2022-04-11 03:13:48,147-Speed 150.31 samples/sec Loss 6.2365 LearningRate 0.0258 Epoch: 19 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:49,082-Speed 10954.57 samples/sec Loss 6.3109 LearningRate 0.0258 Epoch: 19 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:50,032-Speed 10784.37 samples/sec Loss 6.3515 LearningRate 0.0257 Epoch: 19 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:50,941-Speed 11272.24 samples/sec Loss 6.3102 LearningRate 0.0257 Epoch: 19 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:51,867-Speed 11068.70 samples/sec Loss 6.4214 LearningRate 0.0257 Epoch: 19 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:52,835-Speed 10587.36 samples/sec Loss 6.3913 LearningRate 0.0257 Epoch: 19 Global Step: 112060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:53,787-Speed 10760.32 samples/sec Loss 6.2797 LearningRate 0.0257 Epoch: 19 Global Step: 112070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:54,707-Speed 11136.31 samples/sec Loss 6.3955 LearningRate 0.0257 Epoch: 19 Global Step: 112080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:55,641-Speed 10966.36 samples/sec Loss 6.4418 LearningRate 0.0257 Epoch: 19 Global Step: 112090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:13:56,533-Speed 11495.78 samples/sec Loss 6.3934 LearningRate 0.0257 Epoch: 19 Global Step: 112100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:13:57,419-Speed 11566.72 samples/sec Loss 6.2529 LearningRate 0.0257 Epoch: 19 Global Step: 112110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:13:58,300-Speed 11629.46 samples/sec Loss 6.2534 LearningRate 0.0257 Epoch: 19 Global Step: 112120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:13:59,184-Speed 11592.44 samples/sec Loss 6.3900 LearningRate 0.0257 Epoch: 19 Global Step: 112130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:00,101-Speed 11170.60 samples/sec Loss 6.4855 LearningRate 0.0257 Epoch: 19 Global Step: 112140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:01,029-Speed 11043.22 samples/sec Loss 6.3844 LearningRate 0.0257 Epoch: 19 Global Step: 112150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:01,961-Speed 10996.10 samples/sec Loss 6.3900 LearningRate 0.0257 Epoch: 19 Global Step: 112160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:02,911-Speed 10781.56 samples/sec Loss 6.4531 LearningRate 0.0257 Epoch: 19 Global Step: 112170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:03,826-Speed 11202.88 samples/sec Loss 6.4233 LearningRate 0.0257 Epoch: 19 Global Step: 112180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:04,712-Speed 11559.59 samples/sec Loss 6.3107 LearningRate 0.0257 Epoch: 19 Global Step: 112190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:14:05,631-Speed 11152.03 samples/sec Loss 6.2842 LearningRate 0.0257 Epoch: 19 Global Step: 112200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:06,551-Speed 11201.87 samples/sec Loss 6.4919 LearningRate 0.0257 Epoch: 19 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:07,432-Speed 11633.80 samples/sec Loss 6.2246 LearningRate 0.0257 Epoch: 19 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:08,342-Speed 11257.37 samples/sec Loss 6.3529 LearningRate 0.0257 Epoch: 19 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:09,249-Speed 11296.18 samples/sec Loss 6.2641 LearningRate 0.0257 Epoch: 19 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:10,134-Speed 11578.14 samples/sec Loss 6.3907 LearningRate 0.0257 Epoch: 19 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:11,046-Speed 11239.88 samples/sec Loss 6.2064 LearningRate 0.0256 Epoch: 19 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:11,966-Speed 11137.49 samples/sec Loss 6.2535 LearningRate 0.0256 Epoch: 19 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:12,873-Speed 11301.58 samples/sec Loss 6.3998 LearningRate 0.0256 Epoch: 19 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:13,760-Speed 11548.75 samples/sec Loss 6.2529 LearningRate 0.0256 Epoch: 19 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:14,716-Speed 10714.93 samples/sec Loss 6.2839 LearningRate 0.0256 Epoch: 19 Global Step: 112300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:15,629-Speed 11230.85 samples/sec Loss 6.2770 LearningRate 0.0256 Epoch: 19 Global Step: 112310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:16,565-Speed 10942.82 samples/sec Loss 6.5000 LearningRate 0.0256 Epoch: 19 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:17,492-Speed 11055.06 samples/sec Loss 6.3574 LearningRate 0.0256 Epoch: 19 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:18,462-Speed 10558.21 samples/sec Loss 6.3988 LearningRate 0.0256 Epoch: 19 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:19,416-Speed 10747.95 samples/sec Loss 6.3559 LearningRate 0.0256 Epoch: 19 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:20,333-Speed 11173.75 samples/sec Loss 6.4232 LearningRate 0.0256 Epoch: 19 Global Step: 112360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:21,229-Speed 11440.51 samples/sec Loss 6.4315 LearningRate 0.0256 Epoch: 19 Global Step: 112370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:22,123-Speed 11454.60 samples/sec Loss 6.4636 LearningRate 0.0256 Epoch: 19 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:23,002-Speed 11668.37 samples/sec Loss 6.3220 LearningRate 0.0256 Epoch: 19 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:23,931-Speed 11024.17 samples/sec Loss 6.3152 LearningRate 0.0256 Epoch: 19 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:24,832-Speed 11381.61 samples/sec Loss 6.2977 LearningRate 0.0256 Epoch: 19 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:25,752-Speed 11141.10 samples/sec Loss 6.3854 LearningRate 0.0256 Epoch: 19 Global Step: 112420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:26,655-Speed 11352.19 samples/sec Loss 6.2718 LearningRate 0.0256 Epoch: 19 Global Step: 112430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:27,537-Speed 11620.23 samples/sec Loss 6.3021 LearningRate 0.0256 Epoch: 19 Global Step: 112440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:28,463-Speed 11065.22 samples/sec Loss 6.4463 LearningRate 0.0256 Epoch: 19 Global Step: 112450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:29,376-Speed 11220.29 samples/sec Loss 6.2464 LearningRate 0.0256 Epoch: 19 Global Step: 112460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:30,261-Speed 11588.24 samples/sec Loss 6.4130 LearningRate 0.0256 Epoch: 19 Global Step: 112470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:31,226-Speed 10611.92 samples/sec Loss 6.3432 LearningRate 0.0255 Epoch: 19 Global Step: 112480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:32,111-Speed 11580.32 samples/sec Loss 6.4481 LearningRate 0.0255 Epoch: 19 Global Step: 112490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:33,020-Speed 11269.03 samples/sec Loss 6.3921 LearningRate 0.0255 Epoch: 19 Global Step: 112500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:33,915-Speed 11456.42 samples/sec Loss 6.3975 LearningRate 0.0255 Epoch: 19 Global Step: 112510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:34,799-Speed 11590.77 samples/sec Loss 6.4295 LearningRate 0.0255 Epoch: 19 Global Step: 112520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:35,681-Speed 11620.89 samples/sec Loss 6.3501 LearningRate 0.0255 Epoch: 19 Global Step: 112530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:36,616-Speed 10961.84 samples/sec Loss 6.4770 LearningRate 0.0255 Epoch: 19 Global Step: 112540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:37,529-Speed 11214.20 samples/sec Loss 6.3168 LearningRate 0.0255 Epoch: 19 Global Step: 112550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:38,451-Speed 11112.94 samples/sec Loss 6.4217 LearningRate 0.0255 Epoch: 19 Global Step: 112560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:39,376-Speed 11080.88 samples/sec Loss 6.3197 LearningRate 0.0255 Epoch: 19 Global Step: 112570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:40,284-Speed 11288.68 samples/sec Loss 6.2406 LearningRate 0.0255 Epoch: 19 Global Step: 112580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:41,194-Speed 11258.68 samples/sec Loss 6.4273 LearningRate 0.0255 Epoch: 19 Global Step: 112590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:42,106-Speed 11229.45 samples/sec Loss 6.3362 LearningRate 0.0255 Epoch: 19 Global Step: 112600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:43,018-Speed 11236.86 samples/sec Loss 6.3465 LearningRate 0.0255 Epoch: 19 Global Step: 112610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:43,984-Speed 10610.45 samples/sec Loss 6.2467 LearningRate 0.0255 Epoch: 19 Global Step: 112620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:44,889-Speed 11323.96 samples/sec Loss 6.3272 LearningRate 0.0255 Epoch: 19 Global Step: 112630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:45,770-Speed 11627.64 samples/sec Loss 6.4023 LearningRate 0.0255 Epoch: 19 Global Step: 112640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:46,676-Speed 11318.78 samples/sec Loss 6.4268 LearningRate 0.0255 Epoch: 19 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:47,582-Speed 11299.70 samples/sec Loss 6.4521 LearningRate 0.0255 Epoch: 19 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:48,511-Speed 11039.26 samples/sec Loss 6.1747 LearningRate 0.0255 Epoch: 19 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:49,420-Speed 11274.58 samples/sec Loss 6.4088 LearningRate 0.0255 Epoch: 19 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:50,348-Speed 11044.62 samples/sec Loss 6.3216 LearningRate 0.0255 Epoch: 19 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:51,243-Speed 11442.39 samples/sec Loss 6.4639 LearningRate 0.0255 Epoch: 19 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:52,144-Speed 11368.99 samples/sec Loss 6.2202 LearningRate 0.0254 Epoch: 19 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:53,047-Speed 11355.49 samples/sec Loss 6.3945 LearningRate 0.0254 Epoch: 19 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:53,953-Speed 11305.71 samples/sec Loss 6.5083 LearningRate 0.0254 Epoch: 19 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:54,901-Speed 10806.09 samples/sec Loss 6.3486 LearningRate 0.0254 Epoch: 19 Global Step: 112740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:14:55,816-Speed 11207.28 samples/sec Loss 6.2043 LearningRate 0.0254 Epoch: 19 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:56,704-Speed 11538.55 samples/sec Loss 6.3106 LearningRate 0.0254 Epoch: 19 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:57,576-Speed 11760.65 samples/sec Loss 6.4005 LearningRate 0.0254 Epoch: 19 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:58,494-Speed 11156.96 samples/sec Loss 6.3283 LearningRate 0.0254 Epoch: 19 Global Step: 112780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:14:59,419-Speed 11080.19 samples/sec Loss 6.3099 LearningRate 0.0254 Epoch: 19 Global Step: 112790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:00,319-Speed 11387.75 samples/sec Loss 6.2912 LearningRate 0.0254 Epoch: 19 Global Step: 112800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:01,231-Speed 11238.94 samples/sec Loss 6.2972 LearningRate 0.0254 Epoch: 19 Global Step: 112810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:02,160-Speed 11025.57 samples/sec Loss 6.4217 LearningRate 0.0254 Epoch: 19 Global Step: 112820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:03,092-Speed 10988.58 samples/sec Loss 6.3482 LearningRate 0.0254 Epoch: 19 Global Step: 112830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:04,037-Speed 10852.93 samples/sec Loss 6.4147 LearningRate 0.0254 Epoch: 19 Global Step: 112840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:04,944-Speed 11294.91 samples/sec Loss 6.3752 LearningRate 0.0254 Epoch: 19 Global Step: 112850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:05,881-Speed 10933.09 samples/sec Loss 6.4785 LearningRate 0.0254 Epoch: 19 Global Step: 112860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:06,821-Speed 10906.21 samples/sec Loss 6.4232 LearningRate 0.0254 Epoch: 19 Global Step: 112870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:07,716-Speed 11440.84 samples/sec Loss 6.4032 LearningRate 0.0254 Epoch: 19 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:08,635-Speed 11149.35 samples/sec Loss 6.3662 LearningRate 0.0254 Epoch: 19 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:09,537-Speed 11360.23 samples/sec Loss 6.2734 LearningRate 0.0254 Epoch: 19 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:10,442-Speed 11332.12 samples/sec Loss 6.3800 LearningRate 0.0254 Epoch: 19 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:11,314-Speed 11752.36 samples/sec Loss 6.2170 LearningRate 0.0254 Epoch: 19 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:12,240-Speed 11068.61 samples/sec Loss 6.3558 LearningRate 0.0253 Epoch: 19 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:13,126-Speed 11558.59 samples/sec Loss 6.5850 LearningRate 0.0253 Epoch: 19 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:14,036-Speed 11264.64 samples/sec Loss 6.3797 LearningRate 0.0253 Epoch: 19 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:14,986-Speed 10789.76 samples/sec Loss 6.2698 LearningRate 0.0253 Epoch: 19 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:15,879-Speed 11469.62 samples/sec Loss 6.4372 LearningRate 0.0253 Epoch: 19 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:16,769-Speed 11508.31 samples/sec Loss 6.3676 LearningRate 0.0253 Epoch: 19 Global Step: 112980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:17,677-Speed 11283.70 samples/sec Loss 6.3112 LearningRate 0.0253 Epoch: 19 Global Step: 112990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:18,553-Speed 11704.68 samples/sec Loss 6.3852 LearningRate 0.0253 Epoch: 19 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:19,470-Speed 11181.35 samples/sec Loss 6.3664 LearningRate 0.0253 Epoch: 19 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:20,388-Speed 11155.38 samples/sec Loss 6.3266 LearningRate 0.0253 Epoch: 19 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:21,288-Speed 11387.48 samples/sec Loss 6.4339 LearningRate 0.0253 Epoch: 19 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:22,178-Speed 11519.30 samples/sec Loss 6.3271 LearningRate 0.0253 Epoch: 19 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:23,103-Speed 11071.38 samples/sec Loss 6.3730 LearningRate 0.0253 Epoch: 19 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:24,007-Speed 11333.70 samples/sec Loss 6.3537 LearningRate 0.0253 Epoch: 19 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:24,907-Speed 11387.90 samples/sec Loss 6.3475 LearningRate 0.0253 Epoch: 19 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:25,824-Speed 11171.92 samples/sec Loss 6.3888 LearningRate 0.0253 Epoch: 19 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:26,794-Speed 10561.13 samples/sec Loss 6.3746 LearningRate 0.0253 Epoch: 19 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:27,737-Speed 10869.37 samples/sec Loss 6.1939 LearningRate 0.0253 Epoch: 19 Global Step: 113100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:28,711-Speed 10526.47 samples/sec Loss 6.3241 LearningRate 0.0253 Epoch: 19 Global Step: 113110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:29,689-Speed 10467.34 samples/sec Loss 6.2928 LearningRate 0.0253 Epoch: 19 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:30,660-Speed 10562.18 samples/sec Loss 6.4394 LearningRate 0.0253 Epoch: 19 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:31,625-Speed 10618.85 samples/sec Loss 6.3725 LearningRate 0.0253 Epoch: 19 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:32,566-Speed 10888.81 samples/sec Loss 6.3559 LearningRate 0.0253 Epoch: 19 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:33,468-Speed 11362.85 samples/sec Loss 6.2916 LearningRate 0.0252 Epoch: 19 Global Step: 113160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:34,349-Speed 11631.67 samples/sec Loss 6.3355 LearningRate 0.0252 Epoch: 19 Global Step: 113170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:35,247-Speed 11408.36 samples/sec Loss 6.2919 LearningRate 0.0252 Epoch: 19 Global Step: 113180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:36,136-Speed 11522.46 samples/sec Loss 6.2490 LearningRate 0.0252 Epoch: 19 Global Step: 113190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:37,052-Speed 11197.22 samples/sec Loss 6.2432 LearningRate 0.0252 Epoch: 19 Global Step: 113200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:37,961-Speed 11272.61 samples/sec Loss 6.4140 LearningRate 0.0252 Epoch: 19 Global Step: 113210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:38,886-Speed 11073.74 samples/sec Loss 6.2195 LearningRate 0.0252 Epoch: 19 Global Step: 113220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:39,783-Speed 11430.31 samples/sec Loss 6.3915 LearningRate 0.0252 Epoch: 19 Global Step: 113230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:40,698-Speed 11196.31 samples/sec Loss 6.3156 LearningRate 0.0252 Epoch: 19 Global Step: 113240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:41,634-Speed 10949.61 samples/sec Loss 6.5254 LearningRate 0.0252 Epoch: 19 Global Step: 113250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:42,539-Speed 11318.50 samples/sec Loss 6.4150 LearningRate 0.0252 Epoch: 19 Global Step: 113260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:43,460-Speed 11118.66 samples/sec Loss 6.3887 LearningRate 0.0252 Epoch: 19 Global Step: 113270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:44,376-Speed 11186.31 samples/sec Loss 6.3538 LearningRate 0.0252 Epoch: 19 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:45,276-Speed 11391.58 samples/sec Loss 6.4045 LearningRate 0.0252 Epoch: 19 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:46,151-Speed 11712.75 samples/sec Loss 6.1895 LearningRate 0.0252 Epoch: 19 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:47,073-Speed 11113.09 samples/sec Loss 6.3775 LearningRate 0.0252 Epoch: 19 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:47,974-Speed 11372.68 samples/sec Loss 6.2512 LearningRate 0.0252 Epoch: 19 Global Step: 113320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:48,906-Speed 10993.88 samples/sec Loss 6.4671 LearningRate 0.0252 Epoch: 19 Global Step: 113330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:49,843-Speed 10930.70 samples/sec Loss 6.3042 LearningRate 0.0252 Epoch: 19 Global Step: 113340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:15:50,813-Speed 10571.12 samples/sec Loss 6.4696 LearningRate 0.0252 Epoch: 19 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:51,714-Speed 11364.98 samples/sec Loss 6.4256 LearningRate 0.0252 Epoch: 19 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:52,616-Speed 11369.76 samples/sec Loss 6.2796 LearningRate 0.0252 Epoch: 19 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:53,510-Speed 11453.79 samples/sec Loss 6.2569 LearningRate 0.0251 Epoch: 19 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:54,460-Speed 10793.56 samples/sec Loss 6.2689 LearningRate 0.0251 Epoch: 19 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:55,421-Speed 10657.55 samples/sec Loss 6.4248 LearningRate 0.0251 Epoch: 19 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:56,307-Speed 11578.32 samples/sec Loss 6.2895 LearningRate 0.0251 Epoch: 19 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:57,180-Speed 11741.21 samples/sec Loss 6.3216 LearningRate 0.0251 Epoch: 19 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:58,068-Speed 11537.84 samples/sec Loss 6.4505 LearningRate 0.0251 Epoch: 19 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:58,951-Speed 11606.62 samples/sec Loss 6.2042 LearningRate 0.0251 Epoch: 19 Global Step: 113440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:15:59,848-Speed 11422.07 samples/sec Loss 6.3278 LearningRate 0.0251 Epoch: 19 Global Step: 113450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:00,731-Speed 11607.29 samples/sec Loss 6.3180 LearningRate 0.0251 Epoch: 19 Global Step: 113460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:01,645-Speed 11210.58 samples/sec Loss 6.4843 LearningRate 0.0251 Epoch: 19 Global Step: 113470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:02,581-Speed 10949.67 samples/sec Loss 6.4229 LearningRate 0.0251 Epoch: 19 Global Step: 113480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:03,510-Speed 11027.11 samples/sec Loss 6.2812 LearningRate 0.0251 Epoch: 19 Global Step: 113490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:04,418-Speed 11289.42 samples/sec Loss 6.4520 LearningRate 0.0251 Epoch: 19 Global Step: 113500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:05,330-Speed 11237.19 samples/sec Loss 6.3291 LearningRate 0.0251 Epoch: 19 Global Step: 113510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:06,230-Speed 11390.04 samples/sec Loss 6.3533 LearningRate 0.0251 Epoch: 19 Global Step: 113520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:07,181-Speed 10776.22 samples/sec Loss 6.4146 LearningRate 0.0251 Epoch: 19 Global Step: 113530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:08,105-Speed 11092.06 samples/sec Loss 6.4745 LearningRate 0.0251 Epoch: 19 Global Step: 113540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:09,008-Speed 11346.28 samples/sec Loss 6.3768 LearningRate 0.0251 Epoch: 19 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:09,895-Speed 11556.36 samples/sec Loss 6.3518 LearningRate 0.0251 Epoch: 19 Global Step: 113560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:10,773-Speed 11664.93 samples/sec Loss 6.2219 LearningRate 0.0251 Epoch: 19 Global Step: 113570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:11,699-Speed 11071.23 samples/sec Loss 6.3979 LearningRate 0.0251 Epoch: 19 Global Step: 113580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:12,609-Speed 11253.29 samples/sec Loss 6.2973 LearningRate 0.0251 Epoch: 19 Global Step: 113590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:13,546-Speed 10937.79 samples/sec Loss 6.2347 LearningRate 0.0251 Epoch: 19 Global Step: 113600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:14,450-Speed 11334.12 samples/sec Loss 6.4172 LearningRate 0.0250 Epoch: 19 Global Step: 113610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:15,409-Speed 10684.32 samples/sec Loss 6.3505 LearningRate 0.0250 Epoch: 19 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:16,399-Speed 10345.81 samples/sec Loss 6.2853 LearningRate 0.0250 Epoch: 19 Global Step: 113630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:17,330-Speed 11010.97 samples/sec Loss 6.3490 LearningRate 0.0250 Epoch: 19 Global Step: 113640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:18,270-Speed 10905.69 samples/sec Loss 6.3881 LearningRate 0.0250 Epoch: 19 Global Step: 113650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:19,202-Speed 10992.96 samples/sec Loss 6.3550 LearningRate 0.0250 Epoch: 19 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:20,146-Speed 10853.32 samples/sec Loss 6.4112 LearningRate 0.0250 Epoch: 19 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:21,058-Speed 11232.61 samples/sec Loss 6.2369 LearningRate 0.0250 Epoch: 19 Global Step: 113680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:21,955-Speed 11418.44 samples/sec Loss 6.4029 LearningRate 0.0250 Epoch: 19 Global Step: 113690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:22,828-Speed 11742.82 samples/sec Loss 6.4139 LearningRate 0.0250 Epoch: 19 Global Step: 113700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:23,822-Speed 10304.80 samples/sec Loss 6.4219 LearningRate 0.0250 Epoch: 19 Global Step: 113710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:24,697-Speed 11719.09 samples/sec Loss 6.2752 LearningRate 0.0250 Epoch: 19 Global Step: 113720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:35,403-Speed 956.53 samples/sec Loss 5.7192 LearningRate 0.0250 Epoch: 20 Global Step: 113730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:36,389-Speed 10402.94 samples/sec Loss 5.5696 LearningRate 0.0250 Epoch: 20 Global Step: 113740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:37,355-Speed 10609.51 samples/sec Loss 5.7308 LearningRate 0.0250 Epoch: 20 Global Step: 113750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:38,365-Speed 10145.84 samples/sec Loss 5.6288 LearningRate 0.0250 Epoch: 20 Global Step: 113760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:39,343-Speed 10468.16 samples/sec Loss 5.7345 LearningRate 0.0250 Epoch: 20 Global Step: 113770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:40,276-Speed 10982.10 samples/sec Loss 5.7606 LearningRate 0.0250 Epoch: 20 Global Step: 113780 Fp16 Grad Scale: 262144 Required: 4 hours Training: 2022-04-11 03:16:41,165-Speed 11530.18 samples/sec Loss 5.6378 LearningRate 0.0250 Epoch: 20 Global Step: 113790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:42,053-Speed 11537.89 samples/sec Loss 5.5052 LearningRate 0.0250 Epoch: 20 Global Step: 113800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:42,976-Speed 11110.76 samples/sec Loss 5.6685 LearningRate 0.0250 Epoch: 20 Global Step: 113810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:43,915-Speed 10916.03 samples/sec Loss 5.6562 LearningRate 0.0250 Epoch: 20 Global Step: 113820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:44,825-Speed 11261.67 samples/sec Loss 5.5784 LearningRate 0.0250 Epoch: 20 Global Step: 113830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:45,713-Speed 11537.69 samples/sec Loss 5.6544 LearningRate 0.0249 Epoch: 20 Global Step: 113840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:46,592-Speed 11658.68 samples/sec Loss 5.7899 LearningRate 0.0249 Epoch: 20 Global Step: 113850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:47,501-Speed 11266.73 samples/sec Loss 5.6379 LearningRate 0.0249 Epoch: 20 Global Step: 113860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:48,420-Speed 11146.43 samples/sec Loss 5.7405 LearningRate 0.0249 Epoch: 20 Global Step: 113870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:49,346-Speed 11079.65 samples/sec Loss 5.8907 LearningRate 0.0249 Epoch: 20 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:50,272-Speed 11066.46 samples/sec Loss 5.6139 LearningRate 0.0249 Epoch: 20 Global Step: 113890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:51,203-Speed 11009.99 samples/sec Loss 5.7260 LearningRate 0.0249 Epoch: 20 Global Step: 113900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:52,077-Speed 11728.23 samples/sec Loss 5.8146 LearningRate 0.0249 Epoch: 20 Global Step: 113910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:16:52,955-Speed 11671.22 samples/sec Loss 5.7234 LearningRate 0.0249 Epoch: 20 Global Step: 113920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:53,848-Speed 11479.59 samples/sec Loss 5.7638 LearningRate 0.0249 Epoch: 20 Global Step: 113930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:54,766-Speed 11161.40 samples/sec Loss 5.7540 LearningRate 0.0249 Epoch: 20 Global Step: 113940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:55,655-Speed 11526.62 samples/sec Loss 5.6754 LearningRate 0.0249 Epoch: 20 Global Step: 113950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:56,550-Speed 11455.90 samples/sec Loss 5.8529 LearningRate 0.0249 Epoch: 20 Global Step: 113960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:57,429-Speed 11653.34 samples/sec Loss 5.6828 LearningRate 0.0249 Epoch: 20 Global Step: 113970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:58,326-Speed 11426.00 samples/sec Loss 5.6405 LearningRate 0.0249 Epoch: 20 Global Step: 113980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:16:59,259-Speed 10972.71 samples/sec Loss 5.8136 LearningRate 0.0249 Epoch: 20 Global Step: 113990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:17:00,149-Speed 11519.55 samples/sec Loss 5.7507 LearningRate 0.0249 Epoch: 20 Global Step: 114000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:17:21,322-[lfw][114000]XNorm: 10.579831 Training: 2022-04-11 03:17:21,322-[lfw][114000]Accuracy-Flip: 0.99517+-0.00337 Training: 2022-04-11 03:17:21,323-[lfw][114000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:17:46,082-[cfp_fp][114000]XNorm: 8.913355 Training: 2022-04-11 03:17:46,083-[cfp_fp][114000]Accuracy-Flip: 0.94500+-0.01192 Training: 2022-04-11 03:17:46,084-[cfp_fp][114000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:18:07,272-[agedb_30][114000]XNorm: 10.290661 Training: 2022-04-11 03:18:07,272-[agedb_30][114000]Accuracy-Flip: 0.96217+-0.01030 Training: 2022-04-11 03:18:07,273-[agedb_30][114000]Accuracy-Highest: 0.96483 Training: 2022-04-11 03:18:08,186-Speed 150.52 samples/sec Loss 5.8689 LearningRate 0.0249 Epoch: 20 Global Step: 114010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:09,124-Speed 10925.40 samples/sec Loss 5.6385 LearningRate 0.0249 Epoch: 20 Global Step: 114020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:10,020-Speed 11430.84 samples/sec Loss 5.8385 LearningRate 0.0249 Epoch: 20 Global Step: 114030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:10,960-Speed 10904.27 samples/sec Loss 5.7299 LearningRate 0.0249 Epoch: 20 Global Step: 114040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:11,939-Speed 10463.64 samples/sec Loss 5.8370 LearningRate 0.0249 Epoch: 20 Global Step: 114050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:12,907-Speed 10592.04 samples/sec Loss 5.8279 LearningRate 0.0249 Epoch: 20 Global Step: 114060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:13,800-Speed 11473.45 samples/sec Loss 5.8339 LearningRate 0.0248 Epoch: 20 Global Step: 114070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:14,769-Speed 10568.68 samples/sec Loss 5.8805 LearningRate 0.0248 Epoch: 20 Global Step: 114080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:15,694-Speed 11086.31 samples/sec Loss 5.7701 LearningRate 0.0248 Epoch: 20 Global Step: 114090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:16,604-Speed 11258.65 samples/sec Loss 5.8633 LearningRate 0.0248 Epoch: 20 Global Step: 114100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:17,622-Speed 10063.36 samples/sec Loss 5.7684 LearningRate 0.0248 Epoch: 20 Global Step: 114110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:18,542-Speed 11141.54 samples/sec Loss 5.9188 LearningRate 0.0248 Epoch: 20 Global Step: 114120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:19,440-Speed 11401.96 samples/sec Loss 5.8981 LearningRate 0.0248 Epoch: 20 Global Step: 114130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:20,353-Speed 11234.24 samples/sec Loss 5.7744 LearningRate 0.0248 Epoch: 20 Global Step: 114140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:21,268-Speed 11196.81 samples/sec Loss 5.7605 LearningRate 0.0248 Epoch: 20 Global Step: 114150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:22,162-Speed 11452.44 samples/sec Loss 5.7850 LearningRate 0.0248 Epoch: 20 Global Step: 114160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:23,077-Speed 11213.63 samples/sec Loss 5.7316 LearningRate 0.0248 Epoch: 20 Global Step: 114170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:23,966-Speed 11526.39 samples/sec Loss 5.7438 LearningRate 0.0248 Epoch: 20 Global Step: 114180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:24,894-Speed 11035.75 samples/sec Loss 5.7866 LearningRate 0.0248 Epoch: 20 Global Step: 114190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:25,784-Speed 11516.16 samples/sec Loss 5.7577 LearningRate 0.0248 Epoch: 20 Global Step: 114200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:26,677-Speed 11481.92 samples/sec Loss 5.8688 LearningRate 0.0248 Epoch: 20 Global Step: 114210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:27,618-Speed 10890.78 samples/sec Loss 5.6887 LearningRate 0.0248 Epoch: 20 Global Step: 114220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:28,542-Speed 11093.76 samples/sec Loss 5.8276 LearningRate 0.0248 Epoch: 20 Global Step: 114230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:29,453-Speed 11242.19 samples/sec Loss 5.9687 LearningRate 0.0248 Epoch: 20 Global Step: 114240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:30,334-Speed 11634.14 samples/sec Loss 5.7650 LearningRate 0.0248 Epoch: 20 Global Step: 114250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:31,275-Speed 10887.62 samples/sec Loss 5.9117 LearningRate 0.0248 Epoch: 20 Global Step: 114260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:32,177-Speed 11362.09 samples/sec Loss 5.8785 LearningRate 0.0248 Epoch: 20 Global Step: 114270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:33,081-Speed 11327.23 samples/sec Loss 5.7936 LearningRate 0.0248 Epoch: 20 Global Step: 114280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:33,985-Speed 11333.80 samples/sec Loss 5.8492 LearningRate 0.0248 Epoch: 20 Global Step: 114290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:34,863-Speed 11682.02 samples/sec Loss 5.8488 LearningRate 0.0247 Epoch: 20 Global Step: 114300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:35,786-Speed 11094.05 samples/sec Loss 5.8955 LearningRate 0.0247 Epoch: 20 Global Step: 114310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:36,706-Speed 11139.06 samples/sec Loss 5.8988 LearningRate 0.0247 Epoch: 20 Global Step: 114320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:37,635-Speed 11039.37 samples/sec Loss 5.9168 LearningRate 0.0247 Epoch: 20 Global Step: 114330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:38,589-Speed 10736.71 samples/sec Loss 6.0590 LearningRate 0.0247 Epoch: 20 Global Step: 114340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:39,517-Speed 11045.47 samples/sec Loss 5.8111 LearningRate 0.0247 Epoch: 20 Global Step: 114350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:40,460-Speed 10865.10 samples/sec Loss 5.9548 LearningRate 0.0247 Epoch: 20 Global Step: 114360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:41,463-Speed 10212.61 samples/sec Loss 5.9183 LearningRate 0.0247 Epoch: 20 Global Step: 114370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:42,440-Speed 10493.84 samples/sec Loss 5.9695 LearningRate 0.0247 Epoch: 20 Global Step: 114380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:43,359-Speed 11156.86 samples/sec Loss 5.9133 LearningRate 0.0247 Epoch: 20 Global Step: 114390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:44,327-Speed 10584.26 samples/sec Loss 5.9391 LearningRate 0.0247 Epoch: 20 Global Step: 114400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:45,208-Speed 11634.79 samples/sec Loss 5.9115 LearningRate 0.0247 Epoch: 20 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:46,091-Speed 11596.09 samples/sec Loss 5.8104 LearningRate 0.0247 Epoch: 20 Global Step: 114420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:46,989-Speed 11409.91 samples/sec Loss 5.9331 LearningRate 0.0247 Epoch: 20 Global Step: 114430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:47,876-Speed 11555.79 samples/sec Loss 5.9578 LearningRate 0.0247 Epoch: 20 Global Step: 114440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:48,799-Speed 11105.60 samples/sec Loss 6.0007 LearningRate 0.0247 Epoch: 20 Global Step: 114450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:49,775-Speed 10492.73 samples/sec Loss 5.9786 LearningRate 0.0247 Epoch: 20 Global Step: 114460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:50,671-Speed 11448.87 samples/sec Loss 5.7676 LearningRate 0.0247 Epoch: 20 Global Step: 114470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:51,619-Speed 10802.48 samples/sec Loss 5.9894 LearningRate 0.0247 Epoch: 20 Global Step: 114480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:52,540-Speed 11124.58 samples/sec Loss 5.9897 LearningRate 0.0247 Epoch: 20 Global Step: 114490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:53,490-Speed 10788.84 samples/sec Loss 5.8587 LearningRate 0.0247 Epoch: 20 Global Step: 114500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:54,411-Speed 11124.81 samples/sec Loss 5.8743 LearningRate 0.0247 Epoch: 20 Global Step: 114510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:55,326-Speed 11197.26 samples/sec Loss 5.9748 LearningRate 0.0246 Epoch: 20 Global Step: 114520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:56,229-Speed 11353.25 samples/sec Loss 6.0369 LearningRate 0.0246 Epoch: 20 Global Step: 114530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:57,126-Speed 11419.74 samples/sec Loss 5.8001 LearningRate 0.0246 Epoch: 20 Global Step: 114540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:18:58,018-Speed 11482.56 samples/sec Loss 5.9835 LearningRate 0.0246 Epoch: 20 Global Step: 114550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:58,919-Speed 11378.48 samples/sec Loss 5.9484 LearningRate 0.0246 Epoch: 20 Global Step: 114560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:18:59,815-Speed 11433.02 samples/sec Loss 6.0406 LearningRate 0.0246 Epoch: 20 Global Step: 114570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:00,704-Speed 11526.80 samples/sec Loss 5.9403 LearningRate 0.0246 Epoch: 20 Global Step: 114580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:01,608-Speed 11334.16 samples/sec Loss 6.0230 LearningRate 0.0246 Epoch: 20 Global Step: 114590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:02,555-Speed 10826.89 samples/sec Loss 6.0089 LearningRate 0.0246 Epoch: 20 Global Step: 114600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:03,447-Speed 11478.58 samples/sec Loss 5.9460 LearningRate 0.0246 Epoch: 20 Global Step: 114610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:04,348-Speed 11381.38 samples/sec Loss 5.9782 LearningRate 0.0246 Epoch: 20 Global Step: 114620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:05,227-Speed 11656.51 samples/sec Loss 5.8874 LearningRate 0.0246 Epoch: 20 Global Step: 114630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:06,133-Speed 11306.25 samples/sec Loss 5.9838 LearningRate 0.0246 Epoch: 20 Global Step: 114640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:07,049-Speed 11187.60 samples/sec Loss 5.9450 LearningRate 0.0246 Epoch: 20 Global Step: 114650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:07,941-Speed 11490.85 samples/sec Loss 5.9517 LearningRate 0.0246 Epoch: 20 Global Step: 114660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:08,848-Speed 11300.84 samples/sec Loss 5.9026 LearningRate 0.0246 Epoch: 20 Global Step: 114670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:09,755-Speed 11303.39 samples/sec Loss 5.9712 LearningRate 0.0246 Epoch: 20 Global Step: 114680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:10,631-Speed 11688.38 samples/sec Loss 5.9833 LearningRate 0.0246 Epoch: 20 Global Step: 114690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:11,518-Speed 11551.12 samples/sec Loss 5.9802 LearningRate 0.0246 Epoch: 20 Global Step: 114700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:12,453-Speed 10960.13 samples/sec Loss 6.0723 LearningRate 0.0246 Epoch: 20 Global Step: 114710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:13,350-Speed 11433.01 samples/sec Loss 5.9500 LearningRate 0.0246 Epoch: 20 Global Step: 114720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:14,231-Speed 11624.27 samples/sec Loss 5.8491 LearningRate 0.0246 Epoch: 20 Global Step: 114730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:15,146-Speed 11203.14 samples/sec Loss 6.0229 LearningRate 0.0246 Epoch: 20 Global Step: 114740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:19:16,074-Speed 11041.27 samples/sec Loss 5.8816 LearningRate 0.0245 Epoch: 20 Global Step: 114750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:16,984-Speed 11261.58 samples/sec Loss 5.8625 LearningRate 0.0245 Epoch: 20 Global Step: 114760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:17,936-Speed 10758.16 samples/sec Loss 6.0714 LearningRate 0.0245 Epoch: 20 Global Step: 114770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:18,846-Speed 11265.25 samples/sec Loss 5.9714 LearningRate 0.0245 Epoch: 20 Global Step: 114780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:19,768-Speed 11115.95 samples/sec Loss 6.0530 LearningRate 0.0245 Epoch: 20 Global Step: 114790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:20,689-Speed 11119.36 samples/sec Loss 6.1074 LearningRate 0.0245 Epoch: 20 Global Step: 114800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:21,580-Speed 11507.70 samples/sec Loss 6.1657 LearningRate 0.0245 Epoch: 20 Global Step: 114810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:22,476-Speed 11429.13 samples/sec Loss 5.9071 LearningRate 0.0245 Epoch: 20 Global Step: 114820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:23,410-Speed 10971.09 samples/sec Loss 6.0980 LearningRate 0.0245 Epoch: 20 Global Step: 114830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:24,356-Speed 10827.80 samples/sec Loss 5.9237 LearningRate 0.0245 Epoch: 20 Global Step: 114840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:25,264-Speed 11285.64 samples/sec Loss 6.0043 LearningRate 0.0245 Epoch: 20 Global Step: 114850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:26,220-Speed 10717.65 samples/sec Loss 6.0552 LearningRate 0.0245 Epoch: 20 Global Step: 114860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:27,111-Speed 11511.24 samples/sec Loss 6.1529 LearningRate 0.0245 Epoch: 20 Global Step: 114870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:28,001-Speed 11511.23 samples/sec Loss 6.0943 LearningRate 0.0245 Epoch: 20 Global Step: 114880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:28,899-Speed 11411.48 samples/sec Loss 6.1086 LearningRate 0.0245 Epoch: 20 Global Step: 114890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:29,836-Speed 10941.58 samples/sec Loss 5.9637 LearningRate 0.0245 Epoch: 20 Global Step: 114900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:30,767-Speed 11003.10 samples/sec Loss 6.1524 LearningRate 0.0245 Epoch: 20 Global Step: 114910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:31,693-Speed 11064.75 samples/sec Loss 6.0677 LearningRate 0.0245 Epoch: 20 Global Step: 114920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:32,616-Speed 11107.64 samples/sec Loss 6.1842 LearningRate 0.0245 Epoch: 20 Global Step: 114930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:33,514-Speed 11408.51 samples/sec Loss 6.0582 LearningRate 0.0245 Epoch: 20 Global Step: 114940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:34,395-Speed 11631.36 samples/sec Loss 6.0943 LearningRate 0.0245 Epoch: 20 Global Step: 114950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:35,283-Speed 11529.80 samples/sec Loss 6.0482 LearningRate 0.0245 Epoch: 20 Global Step: 114960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:36,191-Speed 11287.96 samples/sec Loss 6.1206 LearningRate 0.0245 Epoch: 20 Global Step: 114970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:37,107-Speed 11186.42 samples/sec Loss 5.9265 LearningRate 0.0244 Epoch: 20 Global Step: 114980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:38,022-Speed 11200.86 samples/sec Loss 6.0091 LearningRate 0.0244 Epoch: 20 Global Step: 114990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:38,908-Speed 11560.91 samples/sec Loss 6.0884 LearningRate 0.0244 Epoch: 20 Global Step: 115000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:39,815-Speed 11300.08 samples/sec Loss 6.1061 LearningRate 0.0244 Epoch: 20 Global Step: 115010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:40,731-Speed 11185.89 samples/sec Loss 6.1007 LearningRate 0.0244 Epoch: 20 Global Step: 115020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:41,679-Speed 10818.31 samples/sec Loss 6.0035 LearningRate 0.0244 Epoch: 20 Global Step: 115030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:42,606-Speed 11044.33 samples/sec Loss 6.0525 LearningRate 0.0244 Epoch: 20 Global Step: 115040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:43,498-Speed 11497.09 samples/sec Loss 5.8848 LearningRate 0.0244 Epoch: 20 Global Step: 115050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:44,383-Speed 11570.96 samples/sec Loss 6.0412 LearningRate 0.0244 Epoch: 20 Global Step: 115060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:45,281-Speed 11409.02 samples/sec Loss 5.9935 LearningRate 0.0244 Epoch: 20 Global Step: 115070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:46,204-Speed 11102.78 samples/sec Loss 6.0627 LearningRate 0.0244 Epoch: 20 Global Step: 115080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:47,108-Speed 11337.64 samples/sec Loss 6.1987 LearningRate 0.0244 Epoch: 20 Global Step: 115090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:48,022-Speed 11210.43 samples/sec Loss 6.1041 LearningRate 0.0244 Epoch: 20 Global Step: 115100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:48,922-Speed 11384.40 samples/sec Loss 6.0353 LearningRate 0.0244 Epoch: 20 Global Step: 115110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:49,864-Speed 10881.49 samples/sec Loss 6.1494 LearningRate 0.0244 Epoch: 20 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:50,766-Speed 11358.86 samples/sec Loss 6.0199 LearningRate 0.0244 Epoch: 20 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:51,697-Speed 11007.85 samples/sec Loss 6.0416 LearningRate 0.0244 Epoch: 20 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:52,608-Speed 11253.41 samples/sec Loss 6.0548 LearningRate 0.0244 Epoch: 20 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:53,541-Speed 10978.43 samples/sec Loss 6.0181 LearningRate 0.0244 Epoch: 20 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:54,423-Speed 11619.77 samples/sec Loss 6.1042 LearningRate 0.0244 Epoch: 20 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:55,304-Speed 11633.89 samples/sec Loss 6.1291 LearningRate 0.0244 Epoch: 20 Global Step: 115180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:56,191-Speed 11549.33 samples/sec Loss 6.1670 LearningRate 0.0244 Epoch: 20 Global Step: 115190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:57,079-Speed 11545.52 samples/sec Loss 6.1825 LearningRate 0.0244 Epoch: 20 Global Step: 115200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:19:57,966-Speed 11547.26 samples/sec Loss 6.0939 LearningRate 0.0243 Epoch: 20 Global Step: 115210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:58,855-Speed 11538.31 samples/sec Loss 6.0881 LearningRate 0.0243 Epoch: 20 Global Step: 115220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:19:59,738-Speed 11593.19 samples/sec Loss 6.0413 LearningRate 0.0243 Epoch: 20 Global Step: 115230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:00,636-Speed 11413.60 samples/sec Loss 6.0441 LearningRate 0.0243 Epoch: 20 Global Step: 115240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:01,583-Speed 10820.41 samples/sec Loss 6.0990 LearningRate 0.0243 Epoch: 20 Global Step: 115250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:02,495-Speed 11232.28 samples/sec Loss 6.0820 LearningRate 0.0243 Epoch: 20 Global Step: 115260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:03,397-Speed 11367.47 samples/sec Loss 6.1004 LearningRate 0.0243 Epoch: 20 Global Step: 115270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:04,353-Speed 10716.34 samples/sec Loss 6.1708 LearningRate 0.0243 Epoch: 20 Global Step: 115280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:05,317-Speed 10629.31 samples/sec Loss 6.0508 LearningRate 0.0243 Epoch: 20 Global Step: 115290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:06,270-Speed 10754.74 samples/sec Loss 6.0902 LearningRate 0.0243 Epoch: 20 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:07,192-Speed 11111.23 samples/sec Loss 6.0018 LearningRate 0.0243 Epoch: 20 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:08,133-Speed 10895.40 samples/sec Loss 6.1003 LearningRate 0.0243 Epoch: 20 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:09,041-Speed 11278.91 samples/sec Loss 6.1263 LearningRate 0.0243 Epoch: 20 Global Step: 115330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:09,968-Speed 11059.57 samples/sec Loss 5.9151 LearningRate 0.0243 Epoch: 20 Global Step: 115340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:10,858-Speed 11510.66 samples/sec Loss 5.9112 LearningRate 0.0243 Epoch: 20 Global Step: 115350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:11,789-Speed 11009.54 samples/sec Loss 6.2420 LearningRate 0.0243 Epoch: 20 Global Step: 115360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:12,708-Speed 11147.63 samples/sec Loss 6.1026 LearningRate 0.0243 Epoch: 20 Global Step: 115370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:13,596-Speed 11542.21 samples/sec Loss 5.9464 LearningRate 0.0243 Epoch: 20 Global Step: 115380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:14,521-Speed 11081.05 samples/sec Loss 6.1578 LearningRate 0.0243 Epoch: 20 Global Step: 115390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:15,465-Speed 10856.26 samples/sec Loss 6.1075 LearningRate 0.0243 Epoch: 20 Global Step: 115400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:16,360-Speed 11449.07 samples/sec Loss 6.1509 LearningRate 0.0243 Epoch: 20 Global Step: 115410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:17,243-Speed 11597.58 samples/sec Loss 6.0823 LearningRate 0.0243 Epoch: 20 Global Step: 115420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:18,157-Speed 11214.27 samples/sec Loss 6.1334 LearningRate 0.0243 Epoch: 20 Global Step: 115430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:19,068-Speed 11243.93 samples/sec Loss 6.0964 LearningRate 0.0242 Epoch: 20 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:19,972-Speed 11339.28 samples/sec Loss 6.0705 LearningRate 0.0242 Epoch: 20 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:20,878-Speed 11317.57 samples/sec Loss 6.1903 LearningRate 0.0242 Epoch: 20 Global Step: 115460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:21,793-Speed 11193.40 samples/sec Loss 6.1372 LearningRate 0.0242 Epoch: 20 Global Step: 115470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:22,668-Speed 11712.57 samples/sec Loss 6.0200 LearningRate 0.0242 Epoch: 20 Global Step: 115480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:23,561-Speed 11476.37 samples/sec Loss 6.1132 LearningRate 0.0242 Epoch: 20 Global Step: 115490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:24,460-Speed 11394.15 samples/sec Loss 6.1818 LearningRate 0.0242 Epoch: 20 Global Step: 115500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:25,390-Speed 11021.23 samples/sec Loss 6.2464 LearningRate 0.0242 Epoch: 20 Global Step: 115510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:26,303-Speed 11224.36 samples/sec Loss 6.1882 LearningRate 0.0242 Epoch: 20 Global Step: 115520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:27,264-Speed 10660.26 samples/sec Loss 6.1764 LearningRate 0.0242 Epoch: 20 Global Step: 115530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:28,190-Speed 11068.38 samples/sec Loss 6.1127 LearningRate 0.0242 Epoch: 20 Global Step: 115540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:29,114-Speed 11089.11 samples/sec Loss 6.0738 LearningRate 0.0242 Epoch: 20 Global Step: 115550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:29,993-Speed 11656.81 samples/sec Loss 6.1249 LearningRate 0.0242 Epoch: 20 Global Step: 115560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:30,896-Speed 11340.23 samples/sec Loss 6.0908 LearningRate 0.0242 Epoch: 20 Global Step: 115570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:31,798-Speed 11362.81 samples/sec Loss 6.1318 LearningRate 0.0242 Epoch: 20 Global Step: 115580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:32,731-Speed 10986.86 samples/sec Loss 6.0624 LearningRate 0.0242 Epoch: 20 Global Step: 115590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:33,612-Speed 11632.67 samples/sec Loss 6.2038 LearningRate 0.0242 Epoch: 20 Global Step: 115600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:34,504-Speed 11491.85 samples/sec Loss 6.0288 LearningRate 0.0242 Epoch: 20 Global Step: 115610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:35,381-Speed 11677.14 samples/sec Loss 6.0729 LearningRate 0.0242 Epoch: 20 Global Step: 115620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:36,279-Speed 11407.45 samples/sec Loss 6.1479 LearningRate 0.0242 Epoch: 20 Global Step: 115630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:37,192-Speed 11228.51 samples/sec Loss 6.0045 LearningRate 0.0242 Epoch: 20 Global Step: 115640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:38,103-Speed 11243.83 samples/sec Loss 6.2177 LearningRate 0.0242 Epoch: 20 Global Step: 115650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:39,030-Speed 11054.56 samples/sec Loss 6.1492 LearningRate 0.0242 Epoch: 20 Global Step: 115660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:39,959-Speed 11037.19 samples/sec Loss 6.2606 LearningRate 0.0241 Epoch: 20 Global Step: 115670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:40,892-Speed 10977.30 samples/sec Loss 6.2434 LearningRate 0.0241 Epoch: 20 Global Step: 115680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:41,809-Speed 11176.11 samples/sec Loss 6.0586 LearningRate 0.0241 Epoch: 20 Global Step: 115690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:42,753-Speed 10859.49 samples/sec Loss 6.3346 LearningRate 0.0241 Epoch: 20 Global Step: 115700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:43,680-Speed 11046.99 samples/sec Loss 6.1726 LearningRate 0.0241 Epoch: 20 Global Step: 115710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:44,567-Speed 11550.50 samples/sec Loss 6.1969 LearningRate 0.0241 Epoch: 20 Global Step: 115720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:45,443-Speed 11701.85 samples/sec Loss 6.1997 LearningRate 0.0241 Epoch: 20 Global Step: 115730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:46,325-Speed 11622.43 samples/sec Loss 6.0968 LearningRate 0.0241 Epoch: 20 Global Step: 115740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:47,222-Speed 11417.60 samples/sec Loss 6.1963 LearningRate 0.0241 Epoch: 20 Global Step: 115750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:48,173-Speed 10775.33 samples/sec Loss 6.1594 LearningRate 0.0241 Epoch: 20 Global Step: 115760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:49,074-Speed 11370.59 samples/sec Loss 6.2017 LearningRate 0.0241 Epoch: 20 Global Step: 115770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:50,011-Speed 10929.71 samples/sec Loss 6.2331 LearningRate 0.0241 Epoch: 20 Global Step: 115780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:20:50,943-Speed 11000.19 samples/sec Loss 6.0513 LearningRate 0.0241 Epoch: 20 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:51,853-Speed 11266.39 samples/sec Loss 6.0635 LearningRate 0.0241 Epoch: 20 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:52,782-Speed 11027.26 samples/sec Loss 6.1485 LearningRate 0.0241 Epoch: 20 Global Step: 115810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:53,697-Speed 11201.98 samples/sec Loss 6.1833 LearningRate 0.0241 Epoch: 20 Global Step: 115820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:54,637-Speed 10895.48 samples/sec Loss 6.2710 LearningRate 0.0241 Epoch: 20 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:55,544-Speed 11301.02 samples/sec Loss 6.0462 LearningRate 0.0241 Epoch: 20 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:56,437-Speed 11474.15 samples/sec Loss 6.2860 LearningRate 0.0241 Epoch: 20 Global Step: 115850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:57,338-Speed 11383.17 samples/sec Loss 6.1833 LearningRate 0.0241 Epoch: 20 Global Step: 115860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:58,225-Speed 11551.23 samples/sec Loss 6.2986 LearningRate 0.0241 Epoch: 20 Global Step: 115870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:20:59,146-Speed 11114.83 samples/sec Loss 6.2281 LearningRate 0.0241 Epoch: 20 Global Step: 115880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:00,038-Speed 11498.82 samples/sec Loss 6.3189 LearningRate 0.0241 Epoch: 20 Global Step: 115890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:21:00,921-Speed 11604.15 samples/sec Loss 6.1492 LearningRate 0.0241 Epoch: 20 Global Step: 115900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:01,827-Speed 11299.54 samples/sec Loss 6.2319 LearningRate 0.0240 Epoch: 20 Global Step: 115910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:02,807-Speed 10463.90 samples/sec Loss 6.1506 LearningRate 0.0240 Epoch: 20 Global Step: 115920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:03,727-Speed 11134.84 samples/sec Loss 6.1326 LearningRate 0.0240 Epoch: 20 Global Step: 115930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:04,693-Speed 10618.54 samples/sec Loss 6.0565 LearningRate 0.0240 Epoch: 20 Global Step: 115940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:05,650-Speed 10705.89 samples/sec Loss 6.1531 LearningRate 0.0240 Epoch: 20 Global Step: 115950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:06,615-Speed 10621.73 samples/sec Loss 5.9739 LearningRate 0.0240 Epoch: 20 Global Step: 115960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:07,554-Speed 10908.78 samples/sec Loss 5.9438 LearningRate 0.0240 Epoch: 20 Global Step: 115970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:08,487-Speed 10983.66 samples/sec Loss 6.3115 LearningRate 0.0240 Epoch: 20 Global Step: 115980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:09,371-Speed 11586.08 samples/sec Loss 6.0811 LearningRate 0.0240 Epoch: 20 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:21:10,258-Speed 11555.97 samples/sec Loss 6.1619 LearningRate 0.0240 Epoch: 20 Global Step: 116000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:21:31,506-[lfw][116000]XNorm: 10.441562 Training: 2022-04-11 03:21:31,507-[lfw][116000]Accuracy-Flip: 0.99533+-0.00386 Training: 2022-04-11 03:21:31,508-[lfw][116000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:21:56,125-[cfp_fp][116000]XNorm: 8.750545 Training: 2022-04-11 03:21:56,125-[cfp_fp][116000]Accuracy-Flip: 0.94271+-0.01288 Training: 2022-04-11 03:21:56,126-[cfp_fp][116000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:22:17,318-[agedb_30][116000]XNorm: 10.209352 Training: 2022-04-11 03:22:17,318-[agedb_30][116000]Accuracy-Flip: 0.96350+-0.00858 Training: 2022-04-11 03:22:17,318-[agedb_30][116000]Accuracy-Highest: 0.96483 Training: 2022-04-11 03:22:18,238-Speed 150.63 samples/sec Loss 6.1249 LearningRate 0.0240 Epoch: 20 Global Step: 116010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:19,136-Speed 11404.71 samples/sec Loss 6.1347 LearningRate 0.0240 Epoch: 20 Global Step: 116020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:20,042-Speed 11310.66 samples/sec Loss 6.2587 LearningRate 0.0240 Epoch: 20 Global Step: 116030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:20,954-Speed 11238.08 samples/sec Loss 6.1636 LearningRate 0.0240 Epoch: 20 Global Step: 116040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:21,842-Speed 11538.85 samples/sec Loss 6.1791 LearningRate 0.0240 Epoch: 20 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:22,728-Speed 11557.31 samples/sec Loss 6.0202 LearningRate 0.0240 Epoch: 20 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:23,644-Speed 11188.38 samples/sec Loss 6.1541 LearningRate 0.0240 Epoch: 20 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:24,582-Speed 10928.14 samples/sec Loss 6.1505 LearningRate 0.0240 Epoch: 20 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:25,518-Speed 10951.11 samples/sec Loss 6.2342 LearningRate 0.0240 Epoch: 20 Global Step: 116090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:26,399-Speed 11630.71 samples/sec Loss 6.1984 LearningRate 0.0240 Epoch: 20 Global Step: 116100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:27,311-Speed 11240.37 samples/sec Loss 6.1197 LearningRate 0.0240 Epoch: 20 Global Step: 116110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:28,229-Speed 11155.38 samples/sec Loss 6.2131 LearningRate 0.0240 Epoch: 20 Global Step: 116120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:29,161-Speed 11000.99 samples/sec Loss 6.3427 LearningRate 0.0240 Epoch: 20 Global Step: 116130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:30,105-Speed 10853.58 samples/sec Loss 6.1200 LearningRate 0.0239 Epoch: 20 Global Step: 116140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:30,999-Speed 11458.76 samples/sec Loss 6.2167 LearningRate 0.0239 Epoch: 20 Global Step: 116150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:31,909-Speed 11267.38 samples/sec Loss 6.1921 LearningRate 0.0239 Epoch: 20 Global Step: 116160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:32,832-Speed 11098.46 samples/sec Loss 6.2339 LearningRate 0.0239 Epoch: 20 Global Step: 116170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:33,726-Speed 11460.41 samples/sec Loss 6.2486 LearningRate 0.0239 Epoch: 20 Global Step: 116180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:34,608-Speed 11615.07 samples/sec Loss 6.1066 LearningRate 0.0239 Epoch: 20 Global Step: 116190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:35,489-Speed 11635.68 samples/sec Loss 6.2572 LearningRate 0.0239 Epoch: 20 Global Step: 116200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:36,390-Speed 11369.01 samples/sec Loss 6.0960 LearningRate 0.0239 Epoch: 20 Global Step: 116210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:37,311-Speed 11136.40 samples/sec Loss 6.0959 LearningRate 0.0239 Epoch: 20 Global Step: 116220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:38,222-Speed 11242.65 samples/sec Loss 6.1913 LearningRate 0.0239 Epoch: 20 Global Step: 116230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:22:39,129-Speed 11299.75 samples/sec Loss 6.2333 LearningRate 0.0239 Epoch: 20 Global Step: 116240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:40,047-Speed 11159.30 samples/sec Loss 6.0863 LearningRate 0.0239 Epoch: 20 Global Step: 116250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:40,996-Speed 10801.94 samples/sec Loss 6.1200 LearningRate 0.0239 Epoch: 20 Global Step: 116260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:41,887-Speed 11495.89 samples/sec Loss 6.1808 LearningRate 0.0239 Epoch: 20 Global Step: 116270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:42,820-Speed 10990.20 samples/sec Loss 6.2578 LearningRate 0.0239 Epoch: 20 Global Step: 116280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:43,733-Speed 11215.40 samples/sec Loss 6.3174 LearningRate 0.0239 Epoch: 20 Global Step: 116290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:44,629-Speed 11438.84 samples/sec Loss 6.2461 LearningRate 0.0239 Epoch: 20 Global Step: 116300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:45,513-Speed 11593.76 samples/sec Loss 6.1720 LearningRate 0.0239 Epoch: 20 Global Step: 116310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:46,404-Speed 11502.00 samples/sec Loss 6.0562 LearningRate 0.0239 Epoch: 20 Global Step: 116320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:47,276-Speed 11751.15 samples/sec Loss 6.1876 LearningRate 0.0239 Epoch: 20 Global Step: 116330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:48,169-Speed 11477.99 samples/sec Loss 6.1469 LearningRate 0.0239 Epoch: 20 Global Step: 116340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:49,093-Speed 11084.41 samples/sec Loss 6.1701 LearningRate 0.0239 Epoch: 20 Global Step: 116350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:49,998-Speed 11320.44 samples/sec Loss 6.0989 LearningRate 0.0239 Epoch: 20 Global Step: 116360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:50,913-Speed 11202.90 samples/sec Loss 6.1016 LearningRate 0.0238 Epoch: 20 Global Step: 116370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:51,815-Speed 11358.40 samples/sec Loss 6.3207 LearningRate 0.0238 Epoch: 20 Global Step: 116380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:52,724-Speed 11274.97 samples/sec Loss 6.1210 LearningRate 0.0238 Epoch: 20 Global Step: 116390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:53,661-Speed 10934.55 samples/sec Loss 6.2112 LearningRate 0.0238 Epoch: 20 Global Step: 116400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:22:54,591-Speed 11018.66 samples/sec Loss 6.1684 LearningRate 0.0238 Epoch: 20 Global Step: 116410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:55,511-Speed 11139.95 samples/sec Loss 6.1956 LearningRate 0.0238 Epoch: 20 Global Step: 116420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:56,425-Speed 11209.80 samples/sec Loss 6.2857 LearningRate 0.0238 Epoch: 20 Global Step: 116430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:57,323-Speed 11416.21 samples/sec Loss 6.1436 LearningRate 0.0238 Epoch: 20 Global Step: 116440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:58,235-Speed 11235.87 samples/sec Loss 6.1237 LearningRate 0.0238 Epoch: 20 Global Step: 116450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:22:59,169-Speed 10975.23 samples/sec Loss 6.2390 LearningRate 0.0238 Epoch: 20 Global Step: 116460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:00,076-Speed 11287.87 samples/sec Loss 6.3097 LearningRate 0.0238 Epoch: 20 Global Step: 116470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:00,963-Speed 11559.49 samples/sec Loss 6.3727 LearningRate 0.0238 Epoch: 20 Global Step: 116480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:01,889-Speed 11063.43 samples/sec Loss 6.3503 LearningRate 0.0238 Epoch: 20 Global Step: 116490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:02,877-Speed 10372.63 samples/sec Loss 6.0767 LearningRate 0.0238 Epoch: 20 Global Step: 116500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:03,818-Speed 10887.19 samples/sec Loss 6.2083 LearningRate 0.0238 Epoch: 20 Global Step: 116510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:04,737-Speed 11153.63 samples/sec Loss 6.1038 LearningRate 0.0238 Epoch: 20 Global Step: 116520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:05,630-Speed 11481.21 samples/sec Loss 6.0629 LearningRate 0.0238 Epoch: 20 Global Step: 116530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:06,565-Speed 10955.34 samples/sec Loss 6.1328 LearningRate 0.0238 Epoch: 20 Global Step: 116540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:07,480-Speed 11204.92 samples/sec Loss 6.2071 LearningRate 0.0238 Epoch: 20 Global Step: 116550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:08,387-Speed 11287.29 samples/sec Loss 6.1875 LearningRate 0.0238 Epoch: 20 Global Step: 116560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:09,270-Speed 11615.77 samples/sec Loss 6.1186 LearningRate 0.0238 Epoch: 20 Global Step: 116570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:10,157-Speed 11547.45 samples/sec Loss 6.1557 LearningRate 0.0238 Epoch: 20 Global Step: 116580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:11,040-Speed 11599.70 samples/sec Loss 6.1237 LearningRate 0.0238 Epoch: 20 Global Step: 116590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:11,952-Speed 11238.62 samples/sec Loss 6.0979 LearningRate 0.0237 Epoch: 20 Global Step: 116600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:12,857-Speed 11322.45 samples/sec Loss 6.3489 LearningRate 0.0237 Epoch: 20 Global Step: 116610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:13,748-Speed 11506.49 samples/sec Loss 6.3505 LearningRate 0.0237 Epoch: 20 Global Step: 116620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:14,636-Speed 11542.29 samples/sec Loss 6.2626 LearningRate 0.0237 Epoch: 20 Global Step: 116630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:15,550-Speed 11210.44 samples/sec Loss 6.0860 LearningRate 0.0237 Epoch: 20 Global Step: 116640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:16,460-Speed 11252.29 samples/sec Loss 6.2050 LearningRate 0.0237 Epoch: 20 Global Step: 116650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:17,358-Speed 11414.97 samples/sec Loss 6.3097 LearningRate 0.0237 Epoch: 20 Global Step: 116660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:18,259-Speed 11368.34 samples/sec Loss 6.1576 LearningRate 0.0237 Epoch: 20 Global Step: 116670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:19,178-Speed 11151.11 samples/sec Loss 6.3119 LearningRate 0.0237 Epoch: 20 Global Step: 116680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:20,079-Speed 11384.89 samples/sec Loss 6.1609 LearningRate 0.0237 Epoch: 20 Global Step: 116690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:20,972-Speed 11472.70 samples/sec Loss 6.2580 LearningRate 0.0237 Epoch: 20 Global Step: 116700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:21,862-Speed 11516.12 samples/sec Loss 6.2816 LearningRate 0.0237 Epoch: 20 Global Step: 116710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:22,754-Speed 11479.54 samples/sec Loss 6.1475 LearningRate 0.0237 Epoch: 20 Global Step: 116720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:23,669-Speed 11197.66 samples/sec Loss 6.2635 LearningRate 0.0237 Epoch: 20 Global Step: 116730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:24,612-Speed 10874.66 samples/sec Loss 6.1276 LearningRate 0.0237 Epoch: 20 Global Step: 116740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:25,489-Speed 11678.32 samples/sec Loss 6.0857 LearningRate 0.0237 Epoch: 20 Global Step: 116750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:26,369-Speed 11646.86 samples/sec Loss 6.1408 LearningRate 0.0237 Epoch: 20 Global Step: 116760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:27,265-Speed 11433.67 samples/sec Loss 6.1970 LearningRate 0.0237 Epoch: 20 Global Step: 116770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:23:28,206-Speed 10890.09 samples/sec Loss 6.2406 LearningRate 0.0237 Epoch: 20 Global Step: 116780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:29,149-Speed 10869.87 samples/sec Loss 6.2422 LearningRate 0.0237 Epoch: 20 Global Step: 116790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:30,097-Speed 10809.47 samples/sec Loss 6.3348 LearningRate 0.0237 Epoch: 20 Global Step: 116800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:31,035-Speed 10921.78 samples/sec Loss 6.1290 LearningRate 0.0237 Epoch: 20 Global Step: 116810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:31,966-Speed 11003.20 samples/sec Loss 6.2128 LearningRate 0.0237 Epoch: 20 Global Step: 116820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:32,847-Speed 11630.34 samples/sec Loss 6.2024 LearningRate 0.0237 Epoch: 20 Global Step: 116830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:33,746-Speed 11396.09 samples/sec Loss 6.2236 LearningRate 0.0236 Epoch: 20 Global Step: 116840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:34,652-Speed 11308.68 samples/sec Loss 6.2137 LearningRate 0.0236 Epoch: 20 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:35,534-Speed 11628.15 samples/sec Loss 6.1066 LearningRate 0.0236 Epoch: 20 Global Step: 116860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:36,422-Speed 11531.70 samples/sec Loss 6.1862 LearningRate 0.0236 Epoch: 20 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:37,319-Speed 11430.64 samples/sec Loss 6.2597 LearningRate 0.0236 Epoch: 20 Global Step: 116880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:38,243-Speed 11082.58 samples/sec Loss 6.1834 LearningRate 0.0236 Epoch: 20 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:39,146-Speed 11350.56 samples/sec Loss 6.1962 LearningRate 0.0236 Epoch: 20 Global Step: 116900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:40,049-Speed 11342.75 samples/sec Loss 6.2332 LearningRate 0.0236 Epoch: 20 Global Step: 116910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:40,970-Speed 11132.15 samples/sec Loss 6.2183 LearningRate 0.0236 Epoch: 20 Global Step: 116920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:41,855-Speed 11588.05 samples/sec Loss 6.2662 LearningRate 0.0236 Epoch: 20 Global Step: 116930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:42,786-Speed 11011.62 samples/sec Loss 6.1367 LearningRate 0.0236 Epoch: 20 Global Step: 116940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:43,701-Speed 11199.26 samples/sec Loss 6.2624 LearningRate 0.0236 Epoch: 20 Global Step: 116950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:44,636-Speed 10958.33 samples/sec Loss 6.1120 LearningRate 0.0236 Epoch: 20 Global Step: 116960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:45,552-Speed 11178.84 samples/sec Loss 6.2279 LearningRate 0.0236 Epoch: 20 Global Step: 116970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:46,470-Speed 11165.55 samples/sec Loss 6.1925 LearningRate 0.0236 Epoch: 20 Global Step: 116980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:47,350-Speed 11640.17 samples/sec Loss 6.1435 LearningRate 0.0236 Epoch: 20 Global Step: 116990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:48,273-Speed 11101.39 samples/sec Loss 6.1913 LearningRate 0.0236 Epoch: 20 Global Step: 117000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:23:49,212-Speed 10919.37 samples/sec Loss 6.1834 LearningRate 0.0236 Epoch: 20 Global Step: 117010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:50,104-Speed 11476.44 samples/sec Loss 6.1763 LearningRate 0.0236 Epoch: 20 Global Step: 117020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:50,997-Speed 11482.51 samples/sec Loss 6.2222 LearningRate 0.0236 Epoch: 20 Global Step: 117030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:51,893-Speed 11431.26 samples/sec Loss 6.1687 LearningRate 0.0236 Epoch: 20 Global Step: 117040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:52,821-Speed 11046.68 samples/sec Loss 6.2746 LearningRate 0.0236 Epoch: 20 Global Step: 117050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:53,729-Speed 11275.65 samples/sec Loss 6.2491 LearningRate 0.0236 Epoch: 20 Global Step: 117060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:54,679-Speed 10789.75 samples/sec Loss 6.1395 LearningRate 0.0235 Epoch: 20 Global Step: 117070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:55,586-Speed 11303.30 samples/sec Loss 6.1146 LearningRate 0.0235 Epoch: 20 Global Step: 117080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:56,485-Speed 11391.36 samples/sec Loss 6.2867 LearningRate 0.0235 Epoch: 20 Global Step: 117090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:57,394-Speed 11272.24 samples/sec Loss 6.2823 LearningRate 0.0235 Epoch: 20 Global Step: 117100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:58,283-Speed 11523.54 samples/sec Loss 6.2035 LearningRate 0.0235 Epoch: 20 Global Step: 117110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:23:59,167-Speed 11593.19 samples/sec Loss 6.2990 LearningRate 0.0235 Epoch: 20 Global Step: 117120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:00,072-Speed 11330.39 samples/sec Loss 6.0769 LearningRate 0.0235 Epoch: 20 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:01,000-Speed 11034.02 samples/sec Loss 6.1544 LearningRate 0.0235 Epoch: 20 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:01,967-Speed 10602.39 samples/sec Loss 6.2874 LearningRate 0.0235 Epoch: 20 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:02,854-Speed 11553.06 samples/sec Loss 6.1051 LearningRate 0.0235 Epoch: 20 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:03,757-Speed 11351.06 samples/sec Loss 6.1655 LearningRate 0.0235 Epoch: 20 Global Step: 117170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:04,668-Speed 11245.78 samples/sec Loss 6.2740 LearningRate 0.0235 Epoch: 20 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:05,605-Speed 10939.51 samples/sec Loss 6.1417 LearningRate 0.0235 Epoch: 20 Global Step: 117190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:06,519-Speed 11209.17 samples/sec Loss 6.1838 LearningRate 0.0235 Epoch: 20 Global Step: 117200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:07,441-Speed 11117.87 samples/sec Loss 6.1995 LearningRate 0.0235 Epoch: 20 Global Step: 117210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:08,361-Speed 11140.66 samples/sec Loss 6.1457 LearningRate 0.0235 Epoch: 20 Global Step: 117220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:09,247-Speed 11559.51 samples/sec Loss 6.2607 LearningRate 0.0235 Epoch: 20 Global Step: 117230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:10,134-Speed 11554.41 samples/sec Loss 6.1573 LearningRate 0.0235 Epoch: 20 Global Step: 117240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:11,034-Speed 11388.31 samples/sec Loss 6.1809 LearningRate 0.0235 Epoch: 20 Global Step: 117250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:11,976-Speed 10871.08 samples/sec Loss 6.3552 LearningRate 0.0235 Epoch: 20 Global Step: 117260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:12,871-Speed 11452.28 samples/sec Loss 6.2311 LearningRate 0.0235 Epoch: 20 Global Step: 117270 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:13,776-Speed 11327.85 samples/sec Loss 6.1959 LearningRate 0.0235 Epoch: 20 Global Step: 117280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:14,650-Speed 11719.31 samples/sec Loss 6.0002 LearningRate 0.0235 Epoch: 20 Global Step: 117290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:15,576-Speed 11066.72 samples/sec Loss 6.2300 LearningRate 0.0235 Epoch: 20 Global Step: 117300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:16,543-Speed 10596.90 samples/sec Loss 6.2054 LearningRate 0.0234 Epoch: 20 Global Step: 117310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:17,508-Speed 10617.87 samples/sec Loss 6.2589 LearningRate 0.0234 Epoch: 20 Global Step: 117320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:18,456-Speed 10813.43 samples/sec Loss 6.2057 LearningRate 0.0234 Epoch: 20 Global Step: 117330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:19,397-Speed 10889.17 samples/sec Loss 6.2857 LearningRate 0.0234 Epoch: 20 Global Step: 117340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:20,340-Speed 10865.84 samples/sec Loss 6.1734 LearningRate 0.0234 Epoch: 20 Global Step: 117350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:21,242-Speed 11355.97 samples/sec Loss 6.1327 LearningRate 0.0234 Epoch: 20 Global Step: 117360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:22,116-Speed 11724.78 samples/sec Loss 6.2106 LearningRate 0.0234 Epoch: 20 Global Step: 117370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:23,041-Speed 11075.45 samples/sec Loss 6.0967 LearningRate 0.0234 Epoch: 20 Global Step: 117380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:23,973-Speed 10990.27 samples/sec Loss 6.1390 LearningRate 0.0234 Epoch: 20 Global Step: 117390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:24,881-Speed 11290.34 samples/sec Loss 6.2669 LearningRate 0.0234 Epoch: 20 Global Step: 117400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:25,790-Speed 11273.46 samples/sec Loss 6.3544 LearningRate 0.0234 Epoch: 20 Global Step: 117410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:26,691-Speed 11371.10 samples/sec Loss 6.2190 LearningRate 0.0234 Epoch: 20 Global Step: 117420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:27,628-Speed 10939.10 samples/sec Loss 6.2067 LearningRate 0.0234 Epoch: 20 Global Step: 117430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:28,518-Speed 11521.77 samples/sec Loss 6.0545 LearningRate 0.0234 Epoch: 20 Global Step: 117440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:29,404-Speed 11558.73 samples/sec Loss 6.1380 LearningRate 0.0234 Epoch: 20 Global Step: 117450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:30,309-Speed 11321.56 samples/sec Loss 6.3069 LearningRate 0.0234 Epoch: 20 Global Step: 117460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:31,196-Speed 11559.31 samples/sec Loss 6.2359 LearningRate 0.0234 Epoch: 20 Global Step: 117470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:32,147-Speed 10779.10 samples/sec Loss 6.1851 LearningRate 0.0234 Epoch: 20 Global Step: 117480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:33,057-Speed 11251.35 samples/sec Loss 6.2965 LearningRate 0.0234 Epoch: 20 Global Step: 117490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:33,959-Speed 11364.32 samples/sec Loss 6.1770 LearningRate 0.0234 Epoch: 20 Global Step: 117500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:34,866-Speed 11292.04 samples/sec Loss 6.2738 LearningRate 0.0234 Epoch: 20 Global Step: 117510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:35,792-Speed 11073.74 samples/sec Loss 6.1963 LearningRate 0.0234 Epoch: 20 Global Step: 117520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:36,708-Speed 11179.42 samples/sec Loss 6.1739 LearningRate 0.0234 Epoch: 20 Global Step: 117530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:37,609-Speed 11380.33 samples/sec Loss 6.2320 LearningRate 0.0233 Epoch: 20 Global Step: 117540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:38,530-Speed 11116.88 samples/sec Loss 6.1479 LearningRate 0.0233 Epoch: 20 Global Step: 117550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:39,475-Speed 10850.12 samples/sec Loss 6.1170 LearningRate 0.0233 Epoch: 20 Global Step: 117560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:40,406-Speed 10997.42 samples/sec Loss 6.1320 LearningRate 0.0233 Epoch: 20 Global Step: 117570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:41,306-Speed 11400.24 samples/sec Loss 6.1437 LearningRate 0.0233 Epoch: 20 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:42,252-Speed 10828.25 samples/sec Loss 6.1480 LearningRate 0.0233 Epoch: 20 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:43,163-Speed 11240.44 samples/sec Loss 6.3443 LearningRate 0.0233 Epoch: 20 Global Step: 117600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:44,063-Speed 11387.81 samples/sec Loss 6.2841 LearningRate 0.0233 Epoch: 20 Global Step: 117610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:44,952-Speed 11522.74 samples/sec Loss 6.2303 LearningRate 0.0233 Epoch: 20 Global Step: 117620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:45,835-Speed 11616.68 samples/sec Loss 6.0942 LearningRate 0.0233 Epoch: 20 Global Step: 117630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:46,716-Speed 11624.88 samples/sec Loss 6.1170 LearningRate 0.0233 Epoch: 20 Global Step: 117640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:47,591-Speed 11726.83 samples/sec Loss 6.1295 LearningRate 0.0233 Epoch: 20 Global Step: 117650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:48,485-Speed 11459.06 samples/sec Loss 6.1799 LearningRate 0.0233 Epoch: 20 Global Step: 117660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:49,416-Speed 11002.09 samples/sec Loss 6.1949 LearningRate 0.0233 Epoch: 20 Global Step: 117670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:50,348-Speed 10995.69 samples/sec Loss 6.2089 LearningRate 0.0233 Epoch: 20 Global Step: 117680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:51,252-Speed 11339.11 samples/sec Loss 6.3407 LearningRate 0.0233 Epoch: 20 Global Step: 117690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:52,176-Speed 11085.03 samples/sec Loss 6.3035 LearningRate 0.0233 Epoch: 20 Global Step: 117700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:53,122-Speed 10826.71 samples/sec Loss 6.2599 LearningRate 0.0233 Epoch: 20 Global Step: 117710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:54,096-Speed 10527.54 samples/sec Loss 6.2146 LearningRate 0.0233 Epoch: 20 Global Step: 117720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:24:55,012-Speed 11183.85 samples/sec Loss 6.1731 LearningRate 0.0233 Epoch: 20 Global Step: 117730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:55,915-Speed 11345.13 samples/sec Loss 6.2047 LearningRate 0.0233 Epoch: 20 Global Step: 117740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:56,843-Speed 11040.25 samples/sec Loss 6.2630 LearningRate 0.0233 Epoch: 20 Global Step: 117750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:57,761-Speed 11169.85 samples/sec Loss 6.4591 LearningRate 0.0233 Epoch: 20 Global Step: 117760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:58,673-Speed 11234.00 samples/sec Loss 6.3919 LearningRate 0.0233 Epoch: 20 Global Step: 117770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:24:59,580-Speed 11300.84 samples/sec Loss 6.0987 LearningRate 0.0232 Epoch: 20 Global Step: 117780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:00,497-Speed 11174.82 samples/sec Loss 6.3276 LearningRate 0.0232 Epoch: 20 Global Step: 117790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:01,417-Speed 11133.89 samples/sec Loss 6.2710 LearningRate 0.0232 Epoch: 20 Global Step: 117800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:02,360-Speed 10865.33 samples/sec Loss 6.2039 LearningRate 0.0232 Epoch: 20 Global Step: 117810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:03,290-Speed 11024.39 samples/sec Loss 6.2210 LearningRate 0.0232 Epoch: 20 Global Step: 117820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:04,227-Speed 10932.51 samples/sec Loss 6.3347 LearningRate 0.0232 Epoch: 20 Global Step: 117830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:05,151-Speed 11093.40 samples/sec Loss 6.2381 LearningRate 0.0232 Epoch: 20 Global Step: 117840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:06,068-Speed 11174.37 samples/sec Loss 6.5048 LearningRate 0.0232 Epoch: 20 Global Step: 117850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:06,958-Speed 11512.52 samples/sec Loss 6.2394 LearningRate 0.0232 Epoch: 20 Global Step: 117860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:07,856-Speed 11418.13 samples/sec Loss 6.3209 LearningRate 0.0232 Epoch: 20 Global Step: 117870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:08,743-Speed 11541.15 samples/sec Loss 6.2663 LearningRate 0.0232 Epoch: 20 Global Step: 117880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:09,638-Speed 11452.20 samples/sec Loss 6.3232 LearningRate 0.0232 Epoch: 20 Global Step: 117890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:10,525-Speed 11557.57 samples/sec Loss 6.1846 LearningRate 0.0232 Epoch: 20 Global Step: 117900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:11,436-Speed 11253.52 samples/sec Loss 6.1760 LearningRate 0.0232 Epoch: 20 Global Step: 117910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:12,334-Speed 11405.46 samples/sec Loss 6.1339 LearningRate 0.0232 Epoch: 20 Global Step: 117920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:13,241-Speed 11292.73 samples/sec Loss 6.2338 LearningRate 0.0232 Epoch: 20 Global Step: 117930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:14,142-Speed 11373.58 samples/sec Loss 6.1568 LearningRate 0.0232 Epoch: 20 Global Step: 117940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:15,053-Speed 11251.84 samples/sec Loss 6.2850 LearningRate 0.0232 Epoch: 20 Global Step: 117950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:15,936-Speed 11608.42 samples/sec Loss 6.3054 LearningRate 0.0232 Epoch: 20 Global Step: 117960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:16,829-Speed 11467.20 samples/sec Loss 6.2070 LearningRate 0.0232 Epoch: 20 Global Step: 117970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:25:17,765-Speed 10951.90 samples/sec Loss 6.2473 LearningRate 0.0232 Epoch: 20 Global Step: 117980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:18,691-Speed 11066.46 samples/sec Loss 6.1812 LearningRate 0.0232 Epoch: 20 Global Step: 117990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:19,613-Speed 11117.44 samples/sec Loss 6.2305 LearningRate 0.0232 Epoch: 20 Global Step: 118000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:25:40,954-[lfw][118000]XNorm: 10.358434 Training: 2022-04-11 03:25:40,955-[lfw][118000]Accuracy-Flip: 0.99533+-0.00364 Training: 2022-04-11 03:25:40,955-[lfw][118000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:26:05,669-[cfp_fp][118000]XNorm: 8.697979 Training: 2022-04-11 03:26:05,670-[cfp_fp][118000]Accuracy-Flip: 0.94086+-0.01342 Training: 2022-04-11 03:26:05,670-[cfp_fp][118000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:26:26,874-[agedb_30][118000]XNorm: 10.152847 Training: 2022-04-11 03:26:26,875-[agedb_30][118000]Accuracy-Flip: 0.96783+-0.00749 Training: 2022-04-11 03:26:26,875-[agedb_30][118000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:26:27,792-Speed 150.19 samples/sec Loss 6.2442 LearningRate 0.0231 Epoch: 20 Global Step: 118010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:28,722-Speed 11017.53 samples/sec Loss 6.2842 LearningRate 0.0231 Epoch: 20 Global Step: 118020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:29,610-Speed 11536.11 samples/sec Loss 6.2006 LearningRate 0.0231 Epoch: 20 Global Step: 118030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:30,495-Speed 11577.31 samples/sec Loss 6.1560 LearningRate 0.0231 Epoch: 20 Global Step: 118040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:31,389-Speed 11465.92 samples/sec Loss 6.1746 LearningRate 0.0231 Epoch: 20 Global Step: 118050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:32,309-Speed 11128.78 samples/sec Loss 6.3090 LearningRate 0.0231 Epoch: 20 Global Step: 118060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:33,199-Speed 11515.55 samples/sec Loss 6.2527 LearningRate 0.0231 Epoch: 20 Global Step: 118070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:34,086-Speed 11561.74 samples/sec Loss 6.2575 LearningRate 0.0231 Epoch: 20 Global Step: 118080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:34,982-Speed 11437.99 samples/sec Loss 6.0923 LearningRate 0.0231 Epoch: 20 Global Step: 118090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:35,883-Speed 11374.72 samples/sec Loss 6.1916 LearningRate 0.0231 Epoch: 20 Global Step: 118100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:36,763-Speed 11642.31 samples/sec Loss 6.2415 LearningRate 0.0231 Epoch: 20 Global Step: 118110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:26:37,667-Speed 11334.58 samples/sec Loss 6.3104 LearningRate 0.0231 Epoch: 20 Global Step: 118120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:26:38,615-Speed 10806.10 samples/sec Loss 6.1759 LearningRate 0.0231 Epoch: 20 Global Step: 118130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:39,516-Speed 11375.17 samples/sec Loss 6.0213 LearningRate 0.0231 Epoch: 20 Global Step: 118140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:40,451-Speed 10952.31 samples/sec Loss 6.2733 LearningRate 0.0231 Epoch: 20 Global Step: 118150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:41,366-Speed 11209.89 samples/sec Loss 6.3237 LearningRate 0.0231 Epoch: 20 Global Step: 118160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:42,273-Speed 11292.78 samples/sec Loss 6.2904 LearningRate 0.0231 Epoch: 20 Global Step: 118170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:43,229-Speed 10714.13 samples/sec Loss 6.2610 LearningRate 0.0231 Epoch: 20 Global Step: 118180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:44,142-Speed 11226.97 samples/sec Loss 6.2820 LearningRate 0.0231 Epoch: 20 Global Step: 118190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:45,047-Speed 11318.03 samples/sec Loss 6.0969 LearningRate 0.0231 Epoch: 20 Global Step: 118200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:45,945-Speed 11412.51 samples/sec Loss 6.2392 LearningRate 0.0231 Epoch: 20 Global Step: 118210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:46,829-Speed 11598.59 samples/sec Loss 6.1174 LearningRate 0.0231 Epoch: 20 Global Step: 118220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:47,722-Speed 11478.65 samples/sec Loss 6.0785 LearningRate 0.0231 Epoch: 20 Global Step: 118230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:48,666-Speed 10846.81 samples/sec Loss 6.0770 LearningRate 0.0231 Epoch: 20 Global Step: 118240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:49,558-Speed 11488.61 samples/sec Loss 6.1823 LearningRate 0.0230 Epoch: 20 Global Step: 118250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:50,465-Speed 11301.36 samples/sec Loss 6.1244 LearningRate 0.0230 Epoch: 20 Global Step: 118260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:51,382-Speed 11165.16 samples/sec Loss 6.2095 LearningRate 0.0230 Epoch: 20 Global Step: 118270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:52,299-Speed 11179.88 samples/sec Loss 6.1300 LearningRate 0.0230 Epoch: 20 Global Step: 118280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:53,216-Speed 11167.88 samples/sec Loss 6.0523 LearningRate 0.0230 Epoch: 20 Global Step: 118290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:54,172-Speed 10727.18 samples/sec Loss 6.3284 LearningRate 0.0230 Epoch: 20 Global Step: 118300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:55,124-Speed 10764.31 samples/sec Loss 6.0811 LearningRate 0.0230 Epoch: 20 Global Step: 118310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:56,054-Speed 11008.32 samples/sec Loss 6.1194 LearningRate 0.0230 Epoch: 20 Global Step: 118320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:56,942-Speed 11548.75 samples/sec Loss 6.2558 LearningRate 0.0230 Epoch: 20 Global Step: 118330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:26:57,831-Speed 11526.79 samples/sec Loss 6.1678 LearningRate 0.0230 Epoch: 20 Global Step: 118340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:26:58,703-Speed 11749.70 samples/sec Loss 6.1308 LearningRate 0.0230 Epoch: 20 Global Step: 118350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:26:59,596-Speed 11470.28 samples/sec Loss 6.2441 LearningRate 0.0230 Epoch: 20 Global Step: 118360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:00,495-Speed 11409.90 samples/sec Loss 6.2634 LearningRate 0.0230 Epoch: 20 Global Step: 118370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:01,410-Speed 11189.88 samples/sec Loss 6.3395 LearningRate 0.0230 Epoch: 20 Global Step: 118380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:02,295-Speed 11585.49 samples/sec Loss 6.0843 LearningRate 0.0230 Epoch: 20 Global Step: 118390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:03,218-Speed 11099.41 samples/sec Loss 6.1666 LearningRate 0.0230 Epoch: 20 Global Step: 118400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:04,142-Speed 11085.31 samples/sec Loss 6.2940 LearningRate 0.0230 Epoch: 20 Global Step: 118410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:05,115-Speed 10538.66 samples/sec Loss 6.2363 LearningRate 0.0230 Epoch: 20 Global Step: 118420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:06,048-Speed 10973.64 samples/sec Loss 6.2872 LearningRate 0.0230 Epoch: 20 Global Step: 118430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:07,023-Speed 10508.71 samples/sec Loss 6.1993 LearningRate 0.0230 Epoch: 20 Global Step: 118440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:07,949-Speed 11069.51 samples/sec Loss 6.2058 LearningRate 0.0230 Epoch: 20 Global Step: 118450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:08,824-Speed 11705.72 samples/sec Loss 6.3084 LearningRate 0.0230 Epoch: 20 Global Step: 118460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:09,714-Speed 11519.33 samples/sec Loss 6.2464 LearningRate 0.0230 Epoch: 20 Global Step: 118470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:10,607-Speed 11483.19 samples/sec Loss 6.2278 LearningRate 0.0230 Epoch: 20 Global Step: 118480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:11,559-Speed 10768.52 samples/sec Loss 6.1961 LearningRate 0.0229 Epoch: 20 Global Step: 118490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:12,456-Speed 11428.29 samples/sec Loss 6.2733 LearningRate 0.0229 Epoch: 20 Global Step: 118500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:13,368-Speed 11225.76 samples/sec Loss 6.2341 LearningRate 0.0229 Epoch: 20 Global Step: 118510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:14,263-Speed 11454.77 samples/sec Loss 6.2781 LearningRate 0.0229 Epoch: 20 Global Step: 118520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:15,146-Speed 11603.91 samples/sec Loss 6.2353 LearningRate 0.0229 Epoch: 20 Global Step: 118530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:16,119-Speed 10527.77 samples/sec Loss 6.0709 LearningRate 0.0229 Epoch: 20 Global Step: 118540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:17,032-Speed 11221.34 samples/sec Loss 6.2021 LearningRate 0.0229 Epoch: 20 Global Step: 118550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:17,939-Speed 11300.23 samples/sec Loss 6.3451 LearningRate 0.0229 Epoch: 20 Global Step: 118560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:18,853-Speed 11219.68 samples/sec Loss 6.2792 LearningRate 0.0229 Epoch: 20 Global Step: 118570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:19,764-Speed 11245.30 samples/sec Loss 6.2214 LearningRate 0.0229 Epoch: 20 Global Step: 118580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:20,681-Speed 11178.58 samples/sec Loss 6.3411 LearningRate 0.0229 Epoch: 20 Global Step: 118590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:21,577-Speed 11440.80 samples/sec Loss 6.0985 LearningRate 0.0229 Epoch: 20 Global Step: 118600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:22,475-Speed 11407.30 samples/sec Loss 6.0616 LearningRate 0.0229 Epoch: 20 Global Step: 118610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:23,362-Speed 11549.04 samples/sec Loss 6.3288 LearningRate 0.0229 Epoch: 20 Global Step: 118620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:24,260-Speed 11406.80 samples/sec Loss 6.2725 LearningRate 0.0229 Epoch: 20 Global Step: 118630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:25,158-Speed 11421.60 samples/sec Loss 6.1628 LearningRate 0.0229 Epoch: 20 Global Step: 118640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:26,096-Speed 10920.85 samples/sec Loss 6.2640 LearningRate 0.0229 Epoch: 20 Global Step: 118650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:27,017-Speed 11125.76 samples/sec Loss 6.2289 LearningRate 0.0229 Epoch: 20 Global Step: 118660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:27,962-Speed 10838.31 samples/sec Loss 6.2820 LearningRate 0.0229 Epoch: 20 Global Step: 118670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:28,858-Speed 11444.24 samples/sec Loss 6.3331 LearningRate 0.0229 Epoch: 20 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:29,756-Speed 11408.64 samples/sec Loss 6.3358 LearningRate 0.0229 Epoch: 20 Global Step: 118690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:30,646-Speed 11513.98 samples/sec Loss 6.2407 LearningRate 0.0229 Epoch: 20 Global Step: 118700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:31,559-Speed 11222.19 samples/sec Loss 6.2546 LearningRate 0.0229 Epoch: 20 Global Step: 118710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:32,489-Speed 11019.19 samples/sec Loss 6.3066 LearningRate 0.0228 Epoch: 20 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:33,368-Speed 11656.17 samples/sec Loss 6.3111 LearningRate 0.0228 Epoch: 20 Global Step: 118730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:34,272-Speed 11333.03 samples/sec Loss 6.1912 LearningRate 0.0228 Epoch: 20 Global Step: 118740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:35,169-Speed 11425.31 samples/sec Loss 6.2607 LearningRate 0.0228 Epoch: 20 Global Step: 118750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:36,055-Speed 11564.51 samples/sec Loss 6.2022 LearningRate 0.0228 Epoch: 20 Global Step: 118760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:36,970-Speed 11204.49 samples/sec Loss 6.2076 LearningRate 0.0228 Epoch: 20 Global Step: 118770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:37,854-Speed 11592.85 samples/sec Loss 6.2370 LearningRate 0.0228 Epoch: 20 Global Step: 118780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:38,767-Speed 11221.35 samples/sec Loss 6.2091 LearningRate 0.0228 Epoch: 20 Global Step: 118790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:39,652-Speed 11574.97 samples/sec Loss 6.1482 LearningRate 0.0228 Epoch: 20 Global Step: 118800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:40,571-Speed 11153.41 samples/sec Loss 6.2672 LearningRate 0.0228 Epoch: 20 Global Step: 118810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:41,476-Speed 11328.06 samples/sec Loss 6.2013 LearningRate 0.0228 Epoch: 20 Global Step: 118820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:42,388-Speed 11237.93 samples/sec Loss 6.2752 LearningRate 0.0228 Epoch: 20 Global Step: 118830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:43,297-Speed 11269.24 samples/sec Loss 6.2023 LearningRate 0.0228 Epoch: 20 Global Step: 118840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:44,246-Speed 10796.64 samples/sec Loss 6.3042 LearningRate 0.0228 Epoch: 20 Global Step: 118850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:45,131-Speed 11579.26 samples/sec Loss 6.1130 LearningRate 0.0228 Epoch: 20 Global Step: 118860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:46,011-Speed 11639.87 samples/sec Loss 6.3234 LearningRate 0.0228 Epoch: 20 Global Step: 118870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:46,900-Speed 11533.31 samples/sec Loss 6.1044 LearningRate 0.0228 Epoch: 20 Global Step: 118880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:47,829-Speed 11031.12 samples/sec Loss 6.3626 LearningRate 0.0228 Epoch: 20 Global Step: 118890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:48,726-Speed 11420.15 samples/sec Loss 6.2661 LearningRate 0.0228 Epoch: 20 Global Step: 118900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:49,676-Speed 10782.43 samples/sec Loss 6.2180 LearningRate 0.0228 Epoch: 20 Global Step: 118910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:50,643-Speed 10606.72 samples/sec Loss 6.2519 LearningRate 0.0228 Epoch: 20 Global Step: 118920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:51,562-Speed 11149.82 samples/sec Loss 6.2471 LearningRate 0.0228 Epoch: 20 Global Step: 118930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:52,485-Speed 11091.49 samples/sec Loss 6.1828 LearningRate 0.0228 Epoch: 20 Global Step: 118940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:53,410-Speed 11080.96 samples/sec Loss 6.3045 LearningRate 0.0228 Epoch: 20 Global Step: 118950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:54,353-Speed 10869.72 samples/sec Loss 6.2970 LearningRate 0.0227 Epoch: 20 Global Step: 118960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:55,298-Speed 10842.97 samples/sec Loss 6.1590 LearningRate 0.0227 Epoch: 20 Global Step: 118970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:56,242-Speed 10847.91 samples/sec Loss 6.1694 LearningRate 0.0227 Epoch: 20 Global Step: 118980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:57,140-Speed 11412.67 samples/sec Loss 6.2131 LearningRate 0.0227 Epoch: 20 Global Step: 118990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:58,026-Speed 11574.05 samples/sec Loss 6.2365 LearningRate 0.0227 Epoch: 20 Global Step: 119000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:27:58,923-Speed 11418.60 samples/sec Loss 6.0687 LearningRate 0.0227 Epoch: 20 Global Step: 119010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:27:59,856-Speed 10982.01 samples/sec Loss 6.1759 LearningRate 0.0227 Epoch: 20 Global Step: 119020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:00,773-Speed 11172.23 samples/sec Loss 6.3140 LearningRate 0.0227 Epoch: 20 Global Step: 119030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:01,686-Speed 11222.28 samples/sec Loss 6.1887 LearningRate 0.0227 Epoch: 20 Global Step: 119040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:02,566-Speed 11656.28 samples/sec Loss 6.1187 LearningRate 0.0227 Epoch: 20 Global Step: 119050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:03,528-Speed 10646.20 samples/sec Loss 6.2562 LearningRate 0.0227 Epoch: 20 Global Step: 119060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:04,431-Speed 11343.19 samples/sec Loss 6.1977 LearningRate 0.0227 Epoch: 20 Global Step: 119070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:05,365-Speed 10978.09 samples/sec Loss 6.1587 LearningRate 0.0227 Epoch: 20 Global Step: 119080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:06,303-Speed 10928.60 samples/sec Loss 6.1553 LearningRate 0.0227 Epoch: 20 Global Step: 119090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:07,196-Speed 11465.48 samples/sec Loss 6.3692 LearningRate 0.0227 Epoch: 20 Global Step: 119100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:08,096-Speed 11388.40 samples/sec Loss 6.1271 LearningRate 0.0227 Epoch: 20 Global Step: 119110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:09,003-Speed 11299.28 samples/sec Loss 6.2568 LearningRate 0.0227 Epoch: 20 Global Step: 119120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:09,884-Speed 11630.74 samples/sec Loss 6.0990 LearningRate 0.0227 Epoch: 20 Global Step: 119130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:10,780-Speed 11436.54 samples/sec Loss 6.1217 LearningRate 0.0227 Epoch: 20 Global Step: 119140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:11,663-Speed 11608.83 samples/sec Loss 6.3115 LearningRate 0.0227 Epoch: 20 Global Step: 119150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:12,567-Speed 11333.11 samples/sec Loss 6.1386 LearningRate 0.0227 Epoch: 20 Global Step: 119160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:13,509-Speed 10877.75 samples/sec Loss 6.1648 LearningRate 0.0227 Epoch: 20 Global Step: 119170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:14,439-Speed 11019.46 samples/sec Loss 6.0625 LearningRate 0.0227 Epoch: 20 Global Step: 119180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:15,382-Speed 10867.39 samples/sec Loss 6.1780 LearningRate 0.0227 Epoch: 20 Global Step: 119190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:16,321-Speed 10906.08 samples/sec Loss 6.1965 LearningRate 0.0226 Epoch: 20 Global Step: 119200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:17,251-Speed 11027.27 samples/sec Loss 6.1250 LearningRate 0.0226 Epoch: 20 Global Step: 119210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:18,224-Speed 10531.73 samples/sec Loss 6.1398 LearningRate 0.0226 Epoch: 20 Global Step: 119220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:19,208-Speed 10411.84 samples/sec Loss 6.1742 LearningRate 0.0226 Epoch: 20 Global Step: 119230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:20,169-Speed 10662.33 samples/sec Loss 6.2066 LearningRate 0.0226 Epoch: 20 Global Step: 119240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:21,073-Speed 11336.80 samples/sec Loss 6.2676 LearningRate 0.0226 Epoch: 20 Global Step: 119250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:21,964-Speed 11494.29 samples/sec Loss 6.2585 LearningRate 0.0226 Epoch: 20 Global Step: 119260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:22,847-Speed 11606.65 samples/sec Loss 6.0559 LearningRate 0.0226 Epoch: 20 Global Step: 119270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:23,754-Speed 11305.17 samples/sec Loss 6.2649 LearningRate 0.0226 Epoch: 20 Global Step: 119280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:24,678-Speed 11088.07 samples/sec Loss 6.1356 LearningRate 0.0226 Epoch: 20 Global Step: 119290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:25,614-Speed 10945.28 samples/sec Loss 6.2152 LearningRate 0.0226 Epoch: 20 Global Step: 119300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:26,604-Speed 10351.83 samples/sec Loss 6.2407 LearningRate 0.0226 Epoch: 20 Global Step: 119310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:27,534-Speed 11014.93 samples/sec Loss 6.2637 LearningRate 0.0226 Epoch: 20 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:28,467-Speed 10984.56 samples/sec Loss 6.0017 LearningRate 0.0226 Epoch: 20 Global Step: 119330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:29,384-Speed 11180.16 samples/sec Loss 6.2069 LearningRate 0.0226 Epoch: 20 Global Step: 119340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:30,296-Speed 11231.98 samples/sec Loss 6.2081 LearningRate 0.0226 Epoch: 20 Global Step: 119350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:31,183-Speed 11551.26 samples/sec Loss 6.1517 LearningRate 0.0226 Epoch: 20 Global Step: 119360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:32,090-Speed 11301.51 samples/sec Loss 6.3051 LearningRate 0.0226 Epoch: 20 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:32,990-Speed 11393.66 samples/sec Loss 6.2276 LearningRate 0.0226 Epoch: 20 Global Step: 119380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:28:33,887-Speed 11418.66 samples/sec Loss 6.1881 LearningRate 0.0226 Epoch: 20 Global Step: 119390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:34,873-Speed 10391.82 samples/sec Loss 6.2007 LearningRate 0.0226 Epoch: 20 Global Step: 119400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:45,851-Speed 932.83 samples/sec Loss 6.0238 LearningRate 0.0226 Epoch: 21 Global Step: 119410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:46,797-Speed 10841.41 samples/sec Loss 5.5492 LearningRate 0.0226 Epoch: 21 Global Step: 119420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:47,964-Speed 8782.88 samples/sec Loss 5.4984 LearningRate 0.0226 Epoch: 21 Global Step: 119430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:48,904-Speed 10904.75 samples/sec Loss 5.4989 LearningRate 0.0225 Epoch: 21 Global Step: 119440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:50,133-Speed 8338.10 samples/sec Loss 5.5177 LearningRate 0.0225 Epoch: 21 Global Step: 119450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:51,059-Speed 11058.36 samples/sec Loss 5.4397 LearningRate 0.0225 Epoch: 21 Global Step: 119460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:52,162-Speed 9294.33 samples/sec Loss 5.3840 LearningRate 0.0225 Epoch: 21 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:28:53,053-Speed 11517.03 samples/sec Loss 5.5408 LearningRate 0.0225 Epoch: 21 Global Step: 119480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:53,985-Speed 10998.15 samples/sec Loss 5.5221 LearningRate 0.0225 Epoch: 21 Global Step: 119490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:54,916-Speed 11006.26 samples/sec Loss 5.5708 LearningRate 0.0225 Epoch: 21 Global Step: 119500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:55,864-Speed 10809.65 samples/sec Loss 5.5424 LearningRate 0.0225 Epoch: 21 Global Step: 119510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:56,773-Speed 11270.05 samples/sec Loss 5.5425 LearningRate 0.0225 Epoch: 21 Global Step: 119520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:57,667-Speed 11456.84 samples/sec Loss 5.4690 LearningRate 0.0225 Epoch: 21 Global Step: 119530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:58,612-Speed 10839.22 samples/sec Loss 5.6092 LearningRate 0.0225 Epoch: 21 Global Step: 119540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:28:59,547-Speed 10969.65 samples/sec Loss 5.5145 LearningRate 0.0225 Epoch: 21 Global Step: 119550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:00,500-Speed 10750.48 samples/sec Loss 5.5486 LearningRate 0.0225 Epoch: 21 Global Step: 119560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:01,409-Speed 11274.67 samples/sec Loss 5.4586 LearningRate 0.0225 Epoch: 21 Global Step: 119570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:02,342-Speed 10981.45 samples/sec Loss 5.6856 LearningRate 0.0225 Epoch: 21 Global Step: 119580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:03,282-Speed 10897.75 samples/sec Loss 5.5290 LearningRate 0.0225 Epoch: 21 Global Step: 119590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:04,188-Speed 11313.82 samples/sec Loss 5.5910 LearningRate 0.0225 Epoch: 21 Global Step: 119600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:05,151-Speed 10651.22 samples/sec Loss 5.4441 LearningRate 0.0225 Epoch: 21 Global Step: 119610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:06,093-Speed 10879.91 samples/sec Loss 5.5868 LearningRate 0.0225 Epoch: 21 Global Step: 119620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:07,040-Speed 10815.85 samples/sec Loss 5.6629 LearningRate 0.0225 Epoch: 21 Global Step: 119630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:08,019-Speed 10466.30 samples/sec Loss 5.5833 LearningRate 0.0225 Epoch: 21 Global Step: 119640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:08,923-Speed 11338.32 samples/sec Loss 5.6094 LearningRate 0.0225 Epoch: 21 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:09,829-Speed 11306.21 samples/sec Loss 5.7777 LearningRate 0.0225 Epoch: 21 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:10,722-Speed 11473.93 samples/sec Loss 5.5661 LearningRate 0.0225 Epoch: 21 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:11,648-Speed 11066.14 samples/sec Loss 5.6121 LearningRate 0.0224 Epoch: 21 Global Step: 119680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:29:12,572-Speed 11102.32 samples/sec Loss 5.5351 LearningRate 0.0224 Epoch: 21 Global Step: 119690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:29:13,463-Speed 11490.35 samples/sec Loss 5.7076 LearningRate 0.0224 Epoch: 21 Global Step: 119700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:29:14,360-Speed 11432.68 samples/sec Loss 5.6405 LearningRate 0.0224 Epoch: 21 Global Step: 119710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:29:15,283-Speed 11098.78 samples/sec Loss 5.5899 LearningRate 0.0224 Epoch: 21 Global Step: 119720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:29:16,186-Speed 11355.35 samples/sec Loss 5.5865 LearningRate 0.0224 Epoch: 21 Global Step: 119730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:17,080-Speed 11458.68 samples/sec Loss 5.6442 LearningRate 0.0224 Epoch: 21 Global Step: 119740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:18,016-Speed 10949.93 samples/sec Loss 5.6710 LearningRate 0.0224 Epoch: 21 Global Step: 119750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:18,932-Speed 11191.11 samples/sec Loss 5.6520 LearningRate 0.0224 Epoch: 21 Global Step: 119760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:19,850-Speed 11159.50 samples/sec Loss 5.7129 LearningRate 0.0224 Epoch: 21 Global Step: 119770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:20,744-Speed 11466.45 samples/sec Loss 5.5940 LearningRate 0.0224 Epoch: 21 Global Step: 119780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:21,661-Speed 11174.71 samples/sec Loss 5.6231 LearningRate 0.0224 Epoch: 21 Global Step: 119790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:22,565-Speed 11337.77 samples/sec Loss 5.6999 LearningRate 0.0224 Epoch: 21 Global Step: 119800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:23,475-Speed 11257.01 samples/sec Loss 5.6756 LearningRate 0.0224 Epoch: 21 Global Step: 119810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:24,409-Speed 10970.80 samples/sec Loss 5.6940 LearningRate 0.0224 Epoch: 21 Global Step: 119820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:25,380-Speed 10560.52 samples/sec Loss 5.6422 LearningRate 0.0224 Epoch: 21 Global Step: 119830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:26,378-Speed 10262.36 samples/sec Loss 5.7581 LearningRate 0.0224 Epoch: 21 Global Step: 119840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:27,345-Speed 10608.85 samples/sec Loss 5.7166 LearningRate 0.0224 Epoch: 21 Global Step: 119850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:28,308-Speed 10631.59 samples/sec Loss 5.7700 LearningRate 0.0224 Epoch: 21 Global Step: 119860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:29,227-Speed 11158.87 samples/sec Loss 5.6827 LearningRate 0.0224 Epoch: 21 Global Step: 119870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:30,194-Speed 10588.41 samples/sec Loss 5.6652 LearningRate 0.0224 Epoch: 21 Global Step: 119880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:29:31,162-Speed 10591.08 samples/sec Loss 5.6444 LearningRate 0.0224 Epoch: 21 Global Step: 119890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:32,124-Speed 10657.20 samples/sec Loss 5.6835 LearningRate 0.0224 Epoch: 21 Global Step: 119900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:33,041-Speed 11179.15 samples/sec Loss 5.8262 LearningRate 0.0224 Epoch: 21 Global Step: 119910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:33,966-Speed 11087.45 samples/sec Loss 5.7051 LearningRate 0.0223 Epoch: 21 Global Step: 119920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:34,887-Speed 11123.75 samples/sec Loss 5.6141 LearningRate 0.0223 Epoch: 21 Global Step: 119930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:35,829-Speed 10874.90 samples/sec Loss 5.8557 LearningRate 0.0223 Epoch: 21 Global Step: 119940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:36,731-Speed 11372.01 samples/sec Loss 5.6253 LearningRate 0.0223 Epoch: 21 Global Step: 119950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:37,691-Speed 10674.99 samples/sec Loss 5.6803 LearningRate 0.0223 Epoch: 21 Global Step: 119960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:38,583-Speed 11492.96 samples/sec Loss 5.7130 LearningRate 0.0223 Epoch: 21 Global Step: 119970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:39,471-Speed 11532.54 samples/sec Loss 5.8115 LearningRate 0.0223 Epoch: 21 Global Step: 119980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:29:40,385-Speed 11218.65 samples/sec Loss 5.6979 LearningRate 0.0223 Epoch: 21 Global Step: 119990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:29:41,299-Speed 11200.92 samples/sec Loss 5.6709 LearningRate 0.0223 Epoch: 21 Global Step: 120000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:30:02,625-[lfw][120000]XNorm: 10.163585 Training: 2022-04-11 03:30:02,625-[lfw][120000]Accuracy-Flip: 0.99500+-0.00373 Training: 2022-04-11 03:30:02,626-[lfw][120000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:30:27,188-[cfp_fp][120000]XNorm: 8.518075 Training: 2022-04-11 03:30:27,189-[cfp_fp][120000]Accuracy-Flip: 0.94414+-0.01163 Training: 2022-04-11 03:30:27,189-[cfp_fp][120000]Accuracy-Highest: 0.94757 Training: 2022-04-11 03:30:48,435-[agedb_30][120000]XNorm: 9.968547 Training: 2022-04-11 03:30:48,436-[agedb_30][120000]Accuracy-Flip: 0.96567+-0.00943 Training: 2022-04-11 03:30:48,436-[agedb_30][120000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:30:49,403-Speed 150.36 samples/sec Loss 5.7531 LearningRate 0.0223 Epoch: 21 Global Step: 120010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:30:50,304-Speed 11376.18 samples/sec Loss 5.7826 LearningRate 0.0223 Epoch: 21 Global Step: 120020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:30:51,219-Speed 11202.23 samples/sec Loss 5.8443 LearningRate 0.0223 Epoch: 21 Global Step: 120030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:30:52,150-Speed 10998.95 samples/sec Loss 5.6812 LearningRate 0.0223 Epoch: 21 Global Step: 120040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:53,077-Speed 11054.98 samples/sec Loss 5.7700 LearningRate 0.0223 Epoch: 21 Global Step: 120050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:54,013-Speed 10947.75 samples/sec Loss 5.7803 LearningRate 0.0223 Epoch: 21 Global Step: 120060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:54,927-Speed 11210.35 samples/sec Loss 5.8742 LearningRate 0.0223 Epoch: 21 Global Step: 120070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:55,864-Speed 10943.81 samples/sec Loss 5.6513 LearningRate 0.0223 Epoch: 21 Global Step: 120080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:56,768-Speed 11332.38 samples/sec Loss 5.7851 LearningRate 0.0223 Epoch: 21 Global Step: 120090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:57,648-Speed 11644.69 samples/sec Loss 5.6930 LearningRate 0.0223 Epoch: 21 Global Step: 120100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:58,680-Speed 9927.65 samples/sec Loss 5.7523 LearningRate 0.0223 Epoch: 21 Global Step: 120110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:30:59,598-Speed 11169.02 samples/sec Loss 5.6861 LearningRate 0.0223 Epoch: 21 Global Step: 120120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:00,502-Speed 11326.41 samples/sec Loss 5.7516 LearningRate 0.0223 Epoch: 21 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:01,409-Speed 11298.72 samples/sec Loss 5.7036 LearningRate 0.0223 Epoch: 21 Global Step: 120140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:02,411-Speed 10226.83 samples/sec Loss 5.8574 LearningRate 0.0223 Epoch: 21 Global Step: 120150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:03,289-Speed 11674.83 samples/sec Loss 5.7791 LearningRate 0.0222 Epoch: 21 Global Step: 120160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:04,187-Speed 11402.63 samples/sec Loss 5.8028 LearningRate 0.0222 Epoch: 21 Global Step: 120170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:05,116-Speed 11031.93 samples/sec Loss 5.7129 LearningRate 0.0222 Epoch: 21 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:06,040-Speed 11088.27 samples/sec Loss 5.6795 LearningRate 0.0222 Epoch: 21 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:06,970-Speed 11024.42 samples/sec Loss 5.8798 LearningRate 0.0222 Epoch: 21 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:07,918-Speed 10812.04 samples/sec Loss 5.8939 LearningRate 0.0222 Epoch: 21 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:08,809-Speed 11497.16 samples/sec Loss 5.6826 LearningRate 0.0222 Epoch: 21 Global Step: 120220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:09,698-Speed 11521.47 samples/sec Loss 5.8746 LearningRate 0.0222 Epoch: 21 Global Step: 120230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:10,647-Speed 10795.42 samples/sec Loss 5.7042 LearningRate 0.0222 Epoch: 21 Global Step: 120240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:11,588-Speed 10897.80 samples/sec Loss 5.8176 LearningRate 0.0222 Epoch: 21 Global Step: 120250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:12,513-Speed 11077.88 samples/sec Loss 5.9371 LearningRate 0.0222 Epoch: 21 Global Step: 120260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:13,466-Speed 10758.21 samples/sec Loss 5.8914 LearningRate 0.0222 Epoch: 21 Global Step: 120270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:14,432-Speed 10607.32 samples/sec Loss 5.9039 LearningRate 0.0222 Epoch: 21 Global Step: 120280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:15,340-Speed 11285.57 samples/sec Loss 5.6951 LearningRate 0.0222 Epoch: 21 Global Step: 120290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:16,243-Speed 11350.13 samples/sec Loss 5.9506 LearningRate 0.0222 Epoch: 21 Global Step: 120300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:17,126-Speed 11600.38 samples/sec Loss 5.8969 LearningRate 0.0222 Epoch: 21 Global Step: 120310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:18,071-Speed 10842.87 samples/sec Loss 5.7977 LearningRate 0.0222 Epoch: 21 Global Step: 120320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:18,984-Speed 11221.52 samples/sec Loss 5.9432 LearningRate 0.0222 Epoch: 21 Global Step: 120330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:19,910-Speed 11064.67 samples/sec Loss 5.9188 LearningRate 0.0222 Epoch: 21 Global Step: 120340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:20,816-Speed 11314.52 samples/sec Loss 5.8775 LearningRate 0.0222 Epoch: 21 Global Step: 120350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:21,693-Speed 11687.93 samples/sec Loss 5.8205 LearningRate 0.0222 Epoch: 21 Global Step: 120360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:22,590-Speed 11427.07 samples/sec Loss 5.8588 LearningRate 0.0222 Epoch: 21 Global Step: 120370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:23,492-Speed 11359.82 samples/sec Loss 5.8337 LearningRate 0.0222 Epoch: 21 Global Step: 120380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:24,401-Speed 11280.07 samples/sec Loss 5.8313 LearningRate 0.0222 Epoch: 21 Global Step: 120390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:25,271-Speed 11783.13 samples/sec Loss 5.8716 LearningRate 0.0221 Epoch: 21 Global Step: 120400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:26,187-Speed 11183.95 samples/sec Loss 5.8715 LearningRate 0.0221 Epoch: 21 Global Step: 120410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:27,065-Speed 11678.08 samples/sec Loss 5.7935 LearningRate 0.0221 Epoch: 21 Global Step: 120420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:27,975-Speed 11253.14 samples/sec Loss 5.9342 LearningRate 0.0221 Epoch: 21 Global Step: 120430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:31:28,871-Speed 11439.92 samples/sec Loss 5.9330 LearningRate 0.0221 Epoch: 21 Global Step: 120440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:29,766-Speed 11446.15 samples/sec Loss 5.9061 LearningRate 0.0221 Epoch: 21 Global Step: 120450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:30,690-Speed 11095.29 samples/sec Loss 5.8581 LearningRate 0.0221 Epoch: 21 Global Step: 120460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:31,645-Speed 10737.51 samples/sec Loss 5.9111 LearningRate 0.0221 Epoch: 21 Global Step: 120470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:32,558-Speed 11227.11 samples/sec Loss 5.8540 LearningRate 0.0221 Epoch: 21 Global Step: 120480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:33,433-Speed 11703.62 samples/sec Loss 5.8658 LearningRate 0.0221 Epoch: 21 Global Step: 120490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:34,328-Speed 11455.36 samples/sec Loss 5.8436 LearningRate 0.0221 Epoch: 21 Global Step: 120500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:35,223-Speed 11446.34 samples/sec Loss 5.8739 LearningRate 0.0221 Epoch: 21 Global Step: 120510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:36,145-Speed 11107.45 samples/sec Loss 5.9733 LearningRate 0.0221 Epoch: 21 Global Step: 120520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:37,045-Speed 11394.77 samples/sec Loss 5.9415 LearningRate 0.0221 Epoch: 21 Global Step: 120530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:37,936-Speed 11496.51 samples/sec Loss 6.0533 LearningRate 0.0221 Epoch: 21 Global Step: 120540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:38,853-Speed 11181.03 samples/sec Loss 5.7635 LearningRate 0.0221 Epoch: 21 Global Step: 120550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:39,813-Speed 10670.93 samples/sec Loss 5.9899 LearningRate 0.0221 Epoch: 21 Global Step: 120560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:40,750-Speed 10929.65 samples/sec Loss 5.8532 LearningRate 0.0221 Epoch: 21 Global Step: 120570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:41,660-Speed 11259.75 samples/sec Loss 5.8684 LearningRate 0.0221 Epoch: 21 Global Step: 120580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:42,571-Speed 11257.49 samples/sec Loss 5.9173 LearningRate 0.0221 Epoch: 21 Global Step: 120590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:43,519-Speed 10808.43 samples/sec Loss 5.8497 LearningRate 0.0221 Epoch: 21 Global Step: 120600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:44,438-Speed 11148.93 samples/sec Loss 5.8090 LearningRate 0.0221 Epoch: 21 Global Step: 120610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:45,324-Speed 11557.97 samples/sec Loss 5.9865 LearningRate 0.0221 Epoch: 21 Global Step: 120620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:46,215-Speed 11506.74 samples/sec Loss 5.8083 LearningRate 0.0221 Epoch: 21 Global Step: 120630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:47,140-Speed 11078.85 samples/sec Loss 6.1017 LearningRate 0.0221 Epoch: 21 Global Step: 120640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:48,069-Speed 11029.73 samples/sec Loss 5.8643 LearningRate 0.0220 Epoch: 21 Global Step: 120650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:48,961-Speed 11484.49 samples/sec Loss 5.9407 LearningRate 0.0220 Epoch: 21 Global Step: 120660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:49,886-Speed 11078.19 samples/sec Loss 5.9044 LearningRate 0.0220 Epoch: 21 Global Step: 120670 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:31:50,773-Speed 11549.48 samples/sec Loss 5.8497 LearningRate 0.0220 Epoch: 21 Global Step: 120680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:51,663-Speed 11517.44 samples/sec Loss 5.8720 LearningRate 0.0220 Epoch: 21 Global Step: 120690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:52,543-Speed 11654.41 samples/sec Loss 5.8617 LearningRate 0.0220 Epoch: 21 Global Step: 120700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:53,437-Speed 11458.74 samples/sec Loss 6.0033 LearningRate 0.0220 Epoch: 21 Global Step: 120710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:54,379-Speed 10877.23 samples/sec Loss 5.9067 LearningRate 0.0220 Epoch: 21 Global Step: 120720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:55,322-Speed 10872.81 samples/sec Loss 5.9671 LearningRate 0.0220 Epoch: 21 Global Step: 120730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:56,246-Speed 11090.99 samples/sec Loss 5.9733 LearningRate 0.0220 Epoch: 21 Global Step: 120740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:57,137-Speed 11491.96 samples/sec Loss 5.9728 LearningRate 0.0220 Epoch: 21 Global Step: 120750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:58,025-Speed 11541.26 samples/sec Loss 5.8506 LearningRate 0.0220 Epoch: 21 Global Step: 120760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:58,917-Speed 11496.47 samples/sec Loss 5.8127 LearningRate 0.0220 Epoch: 21 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:31:59,825-Speed 11276.51 samples/sec Loss 6.0680 LearningRate 0.0220 Epoch: 21 Global Step: 120780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:00,740-Speed 11201.58 samples/sec Loss 6.0536 LearningRate 0.0220 Epoch: 21 Global Step: 120790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:01,680-Speed 10907.36 samples/sec Loss 6.0723 LearningRate 0.0220 Epoch: 21 Global Step: 120800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:02,619-Speed 10909.42 samples/sec Loss 6.0483 LearningRate 0.0220 Epoch: 21 Global Step: 120810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:03,559-Speed 10901.85 samples/sec Loss 5.9725 LearningRate 0.0220 Epoch: 21 Global Step: 120820 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:04,476-Speed 11176.75 samples/sec Loss 5.8098 LearningRate 0.0220 Epoch: 21 Global Step: 120830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:05,398-Speed 11109.76 samples/sec Loss 5.7984 LearningRate 0.0220 Epoch: 21 Global Step: 120840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:06,283-Speed 11583.54 samples/sec Loss 5.8819 LearningRate 0.0220 Epoch: 21 Global Step: 120850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:07,174-Speed 11501.50 samples/sec Loss 5.9071 LearningRate 0.0220 Epoch: 21 Global Step: 120860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:08,061-Speed 11547.85 samples/sec Loss 5.8758 LearningRate 0.0220 Epoch: 21 Global Step: 120870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:08,967-Speed 11311.40 samples/sec Loss 5.9809 LearningRate 0.0220 Epoch: 21 Global Step: 120880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:09,852-Speed 11579.28 samples/sec Loss 5.8430 LearningRate 0.0219 Epoch: 21 Global Step: 120890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:10,754-Speed 11359.95 samples/sec Loss 6.1181 LearningRate 0.0219 Epoch: 21 Global Step: 120900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:11,651-Speed 11426.94 samples/sec Loss 6.0294 LearningRate 0.0219 Epoch: 21 Global Step: 120910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:12,532-Speed 11636.34 samples/sec Loss 6.0311 LearningRate 0.0219 Epoch: 21 Global Step: 120920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:13,440-Speed 11279.99 samples/sec Loss 5.8580 LearningRate 0.0219 Epoch: 21 Global Step: 120930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:14,368-Speed 11043.60 samples/sec Loss 5.8865 LearningRate 0.0219 Epoch: 21 Global Step: 120940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:15,255-Speed 11558.52 samples/sec Loss 5.7795 LearningRate 0.0219 Epoch: 21 Global Step: 120950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:16,147-Speed 11490.91 samples/sec Loss 5.7732 LearningRate 0.0219 Epoch: 21 Global Step: 120960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:17,059-Speed 11230.65 samples/sec Loss 5.8875 LearningRate 0.0219 Epoch: 21 Global Step: 120970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:17,953-Speed 11469.13 samples/sec Loss 6.0106 LearningRate 0.0219 Epoch: 21 Global Step: 120980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:18,926-Speed 10529.59 samples/sec Loss 5.8959 LearningRate 0.0219 Epoch: 21 Global Step: 120990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:19,870-Speed 10854.49 samples/sec Loss 6.0152 LearningRate 0.0219 Epoch: 21 Global Step: 121000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:20,752-Speed 11610.47 samples/sec Loss 5.9464 LearningRate 0.0219 Epoch: 21 Global Step: 121010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:21,630-Speed 11671.92 samples/sec Loss 5.8987 LearningRate 0.0219 Epoch: 21 Global Step: 121020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:22,509-Speed 11661.95 samples/sec Loss 5.9375 LearningRate 0.0219 Epoch: 21 Global Step: 121030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:23,418-Speed 11280.07 samples/sec Loss 6.0293 LearningRate 0.0219 Epoch: 21 Global Step: 121040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:24,320-Speed 11360.34 samples/sec Loss 5.9937 LearningRate 0.0219 Epoch: 21 Global Step: 121050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:25,212-Speed 11487.42 samples/sec Loss 5.8853 LearningRate 0.0219 Epoch: 21 Global Step: 121060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:26,135-Speed 11098.71 samples/sec Loss 5.9659 LearningRate 0.0219 Epoch: 21 Global Step: 121070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:27,022-Speed 11553.00 samples/sec Loss 5.9180 LearningRate 0.0219 Epoch: 21 Global Step: 121080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:27,949-Speed 11054.37 samples/sec Loss 6.1437 LearningRate 0.0219 Epoch: 21 Global Step: 121090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:28,875-Speed 11067.40 samples/sec Loss 6.0831 LearningRate 0.0219 Epoch: 21 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:29,829-Speed 10734.46 samples/sec Loss 5.9254 LearningRate 0.0219 Epoch: 21 Global Step: 121110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:30,746-Speed 11179.64 samples/sec Loss 6.0270 LearningRate 0.0219 Epoch: 21 Global Step: 121120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:31,662-Speed 11185.78 samples/sec Loss 6.0142 LearningRate 0.0218 Epoch: 21 Global Step: 121130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:32,555-Speed 11483.37 samples/sec Loss 5.9517 LearningRate 0.0218 Epoch: 21 Global Step: 121140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:33,447-Speed 11477.09 samples/sec Loss 5.9134 LearningRate 0.0218 Epoch: 21 Global Step: 121150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:34,326-Speed 11667.43 samples/sec Loss 6.0140 LearningRate 0.0218 Epoch: 21 Global Step: 121160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:35,249-Speed 11100.50 samples/sec Loss 6.0379 LearningRate 0.0218 Epoch: 21 Global Step: 121170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:36,141-Speed 11492.61 samples/sec Loss 5.9905 LearningRate 0.0218 Epoch: 21 Global Step: 121180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:37,022-Speed 11630.15 samples/sec Loss 5.9958 LearningRate 0.0218 Epoch: 21 Global Step: 121190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:37,935-Speed 11226.70 samples/sec Loss 5.9713 LearningRate 0.0218 Epoch: 21 Global Step: 121200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:38,856-Speed 11125.16 samples/sec Loss 6.1135 LearningRate 0.0218 Epoch: 21 Global Step: 121210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:39,773-Speed 11170.99 samples/sec Loss 5.8380 LearningRate 0.0218 Epoch: 21 Global Step: 121220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:40,672-Speed 11407.28 samples/sec Loss 5.9467 LearningRate 0.0218 Epoch: 21 Global Step: 121230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:41,657-Speed 10399.87 samples/sec Loss 5.9960 LearningRate 0.0218 Epoch: 21 Global Step: 121240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:42,560-Speed 11347.25 samples/sec Loss 6.1006 LearningRate 0.0218 Epoch: 21 Global Step: 121250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:43,435-Speed 11717.94 samples/sec Loss 6.0640 LearningRate 0.0218 Epoch: 21 Global Step: 121260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:44,314-Speed 11650.59 samples/sec Loss 5.9443 LearningRate 0.0218 Epoch: 21 Global Step: 121270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:45,189-Speed 11714.09 samples/sec Loss 6.0999 LearningRate 0.0218 Epoch: 21 Global Step: 121280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:46,090-Speed 11371.55 samples/sec Loss 6.0113 LearningRate 0.0218 Epoch: 21 Global Step: 121290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:46,979-Speed 11522.22 samples/sec Loss 6.0229 LearningRate 0.0218 Epoch: 21 Global Step: 121300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:47,874-Speed 11460.62 samples/sec Loss 6.0059 LearningRate 0.0218 Epoch: 21 Global Step: 121310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:48,815-Speed 10886.19 samples/sec Loss 6.0840 LearningRate 0.0218 Epoch: 21 Global Step: 121320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:49,717-Speed 11363.54 samples/sec Loss 6.0704 LearningRate 0.0218 Epoch: 21 Global Step: 121330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:50,659-Speed 10869.11 samples/sec Loss 6.0990 LearningRate 0.0218 Epoch: 21 Global Step: 121340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:51,578-Speed 11156.62 samples/sec Loss 5.9585 LearningRate 0.0218 Epoch: 21 Global Step: 121350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:52,540-Speed 10648.00 samples/sec Loss 5.9107 LearningRate 0.0218 Epoch: 21 Global Step: 121360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:53,465-Speed 11076.29 samples/sec Loss 6.0406 LearningRate 0.0217 Epoch: 21 Global Step: 121370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:54,372-Speed 11304.76 samples/sec Loss 6.1415 LearningRate 0.0217 Epoch: 21 Global Step: 121380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:32:55,259-Speed 11546.34 samples/sec Loss 5.9751 LearningRate 0.0217 Epoch: 21 Global Step: 121390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:56,168-Speed 11281.45 samples/sec Loss 5.9056 LearningRate 0.0217 Epoch: 21 Global Step: 121400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:57,068-Speed 11392.50 samples/sec Loss 6.2406 LearningRate 0.0217 Epoch: 21 Global Step: 121410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:57,958-Speed 11510.29 samples/sec Loss 6.0111 LearningRate 0.0217 Epoch: 21 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:58,875-Speed 11180.74 samples/sec Loss 5.8955 LearningRate 0.0217 Epoch: 21 Global Step: 121430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:32:59,769-Speed 11460.02 samples/sec Loss 5.9906 LearningRate 0.0217 Epoch: 21 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:00,739-Speed 10559.99 samples/sec Loss 5.9770 LearningRate 0.0217 Epoch: 21 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:01,678-Speed 10912.94 samples/sec Loss 5.9577 LearningRate 0.0217 Epoch: 21 Global Step: 121460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:02,616-Speed 10926.00 samples/sec Loss 5.9624 LearningRate 0.0217 Epoch: 21 Global Step: 121470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:03,505-Speed 11520.81 samples/sec Loss 6.0487 LearningRate 0.0217 Epoch: 21 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:04,392-Speed 11558.11 samples/sec Loss 5.8850 LearningRate 0.0217 Epoch: 21 Global Step: 121490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:33:05,323-Speed 11008.38 samples/sec Loss 5.9888 LearningRate 0.0217 Epoch: 21 Global Step: 121500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:33:06,217-Speed 11464.87 samples/sec Loss 5.9713 LearningRate 0.0217 Epoch: 21 Global Step: 121510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:07,107-Speed 11520.43 samples/sec Loss 6.0459 LearningRate 0.0217 Epoch: 21 Global Step: 121520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:08,001-Speed 11467.73 samples/sec Loss 6.0640 LearningRate 0.0217 Epoch: 21 Global Step: 121530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:08,903-Speed 11361.88 samples/sec Loss 6.0611 LearningRate 0.0217 Epoch: 21 Global Step: 121540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:09,795-Speed 11481.65 samples/sec Loss 6.1058 LearningRate 0.0217 Epoch: 21 Global Step: 121550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:10,697-Speed 11367.75 samples/sec Loss 5.9692 LearningRate 0.0217 Epoch: 21 Global Step: 121560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:11,611-Speed 11212.65 samples/sec Loss 5.9347 LearningRate 0.0217 Epoch: 21 Global Step: 121570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:12,528-Speed 11174.29 samples/sec Loss 6.0872 LearningRate 0.0217 Epoch: 21 Global Step: 121580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:13,433-Speed 11322.98 samples/sec Loss 6.1373 LearningRate 0.0217 Epoch: 21 Global Step: 121590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:14,335-Speed 11360.69 samples/sec Loss 5.9748 LearningRate 0.0217 Epoch: 21 Global Step: 121600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:15,278-Speed 10869.44 samples/sec Loss 5.8722 LearningRate 0.0217 Epoch: 21 Global Step: 121610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:33:16,177-Speed 11396.62 samples/sec Loss 5.9742 LearningRate 0.0216 Epoch: 21 Global Step: 121620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:33:17,089-Speed 11238.53 samples/sec Loss 6.2132 LearningRate 0.0216 Epoch: 21 Global Step: 121630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:33:17,974-Speed 11584.68 samples/sec Loss 6.0610 LearningRate 0.0216 Epoch: 21 Global Step: 121640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:18,866-Speed 11489.15 samples/sec Loss 6.1081 LearningRate 0.0216 Epoch: 21 Global Step: 121650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:19,771-Speed 11321.50 samples/sec Loss 6.0782 LearningRate 0.0216 Epoch: 21 Global Step: 121660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:20,668-Speed 11421.28 samples/sec Loss 6.0941 LearningRate 0.0216 Epoch: 21 Global Step: 121670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:21,558-Speed 11513.86 samples/sec Loss 6.0034 LearningRate 0.0216 Epoch: 21 Global Step: 121680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:22,459-Speed 11372.17 samples/sec Loss 6.0153 LearningRate 0.0216 Epoch: 21 Global Step: 121690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:23,357-Speed 11406.14 samples/sec Loss 6.0385 LearningRate 0.0216 Epoch: 21 Global Step: 121700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:24,261-Speed 11341.22 samples/sec Loss 6.0011 LearningRate 0.0216 Epoch: 21 Global Step: 121710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:25,161-Speed 11385.72 samples/sec Loss 5.9784 LearningRate 0.0216 Epoch: 21 Global Step: 121720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:26,102-Speed 10882.82 samples/sec Loss 5.9902 LearningRate 0.0216 Epoch: 21 Global Step: 121730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:27,057-Speed 10733.18 samples/sec Loss 5.9552 LearningRate 0.0216 Epoch: 21 Global Step: 121740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:28,005-Speed 10816.01 samples/sec Loss 6.2031 LearningRate 0.0216 Epoch: 21 Global Step: 121750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:28,955-Speed 10775.56 samples/sec Loss 5.8654 LearningRate 0.0216 Epoch: 21 Global Step: 121760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:29,886-Speed 11006.67 samples/sec Loss 6.1004 LearningRate 0.0216 Epoch: 21 Global Step: 121770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:30,841-Speed 10732.24 samples/sec Loss 6.0542 LearningRate 0.0216 Epoch: 21 Global Step: 121780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:31,739-Speed 11409.69 samples/sec Loss 6.0232 LearningRate 0.0216 Epoch: 21 Global Step: 121790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:32,627-Speed 11547.95 samples/sec Loss 6.1583 LearningRate 0.0216 Epoch: 21 Global Step: 121800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:33,528-Speed 11392.43 samples/sec Loss 5.9611 LearningRate 0.0216 Epoch: 21 Global Step: 121810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:33:34,422-Speed 11467.70 samples/sec Loss 6.1437 LearningRate 0.0216 Epoch: 21 Global Step: 121820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:35,320-Speed 11399.59 samples/sec Loss 6.0856 LearningRate 0.0216 Epoch: 21 Global Step: 121830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:36,240-Speed 11148.37 samples/sec Loss 5.9868 LearningRate 0.0216 Epoch: 21 Global Step: 121840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:37,191-Speed 10783.99 samples/sec Loss 5.9751 LearningRate 0.0216 Epoch: 21 Global Step: 121850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:38,145-Speed 10735.19 samples/sec Loss 6.0867 LearningRate 0.0215 Epoch: 21 Global Step: 121860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:39,067-Speed 11120.35 samples/sec Loss 6.0928 LearningRate 0.0215 Epoch: 21 Global Step: 121870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:39,973-Speed 11308.92 samples/sec Loss 6.0196 LearningRate 0.0215 Epoch: 21 Global Step: 121880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:40,906-Speed 10981.03 samples/sec Loss 5.9702 LearningRate 0.0215 Epoch: 21 Global Step: 121890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:41,862-Speed 10723.01 samples/sec Loss 6.0487 LearningRate 0.0215 Epoch: 21 Global Step: 121900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:42,774-Speed 11237.42 samples/sec Loss 6.0959 LearningRate 0.0215 Epoch: 21 Global Step: 121910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:43,672-Speed 11407.44 samples/sec Loss 6.1086 LearningRate 0.0215 Epoch: 21 Global Step: 121920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:33:44,552-Speed 11639.00 samples/sec Loss 6.0979 LearningRate 0.0215 Epoch: 21 Global Step: 121930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:45,455-Speed 11351.41 samples/sec Loss 5.9344 LearningRate 0.0215 Epoch: 21 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:46,365-Speed 11258.60 samples/sec Loss 5.8291 LearningRate 0.0215 Epoch: 21 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:47,264-Speed 11405.38 samples/sec Loss 5.9657 LearningRate 0.0215 Epoch: 21 Global Step: 121960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:48,148-Speed 11585.93 samples/sec Loss 5.9779 LearningRate 0.0215 Epoch: 21 Global Step: 121970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:49,038-Speed 11511.91 samples/sec Loss 6.0513 LearningRate 0.0215 Epoch: 21 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:49,930-Speed 11493.84 samples/sec Loss 5.9161 LearningRate 0.0215 Epoch: 21 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:33:50,825-Speed 11446.56 samples/sec Loss 6.0592 LearningRate 0.0215 Epoch: 21 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:34:12,153-[lfw][122000]XNorm: 9.914756 Training: 2022-04-11 03:34:12,153-[lfw][122000]Accuracy-Flip: 0.99550+-0.00373 Training: 2022-04-11 03:34:12,153-[lfw][122000]Accuracy-Highest: 0.99617 Training: 2022-04-11 03:34:36,830-[cfp_fp][122000]XNorm: 8.375529 Training: 2022-04-11 03:34:36,830-[cfp_fp][122000]Accuracy-Flip: 0.94814+-0.01407 Training: 2022-04-11 03:34:36,831-[cfp_fp][122000]Accuracy-Highest: 0.94814 Training: 2022-04-11 03:34:58,054-[agedb_30][122000]XNorm: 9.732270 Training: 2022-04-11 03:34:58,054-[agedb_30][122000]Accuracy-Flip: 0.96450+-0.00785 Training: 2022-04-11 03:34:58,055-[agedb_30][122000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:34:58,999-Speed 150.21 samples/sec Loss 6.0533 LearningRate 0.0215 Epoch: 21 Global Step: 122010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:34:59,941-Speed 10878.32 samples/sec Loss 6.1217 LearningRate 0.0215 Epoch: 21 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:00,874-Speed 10977.90 samples/sec Loss 6.1617 LearningRate 0.0215 Epoch: 21 Global Step: 122030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:01,765-Speed 11500.04 samples/sec Loss 5.9688 LearningRate 0.0215 Epoch: 21 Global Step: 122040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:02,681-Speed 11190.61 samples/sec Loss 6.0202 LearningRate 0.0215 Epoch: 21 Global Step: 122050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:03,599-Speed 11164.09 samples/sec Loss 6.0325 LearningRate 0.0215 Epoch: 21 Global Step: 122060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:04,495-Speed 11427.87 samples/sec Loss 5.9984 LearningRate 0.0215 Epoch: 21 Global Step: 122070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:05,444-Speed 10800.97 samples/sec Loss 5.9522 LearningRate 0.0215 Epoch: 21 Global Step: 122080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:06,349-Speed 11323.60 samples/sec Loss 6.0348 LearningRate 0.0215 Epoch: 21 Global Step: 122090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:07,245-Speed 11431.96 samples/sec Loss 6.0974 LearningRate 0.0215 Epoch: 21 Global Step: 122100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:08,159-Speed 11214.54 samples/sec Loss 6.0896 LearningRate 0.0214 Epoch: 21 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:09,031-Speed 11750.95 samples/sec Loss 5.9846 LearningRate 0.0214 Epoch: 21 Global Step: 122120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:09,923-Speed 11490.87 samples/sec Loss 6.1072 LearningRate 0.0214 Epoch: 21 Global Step: 122130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:10,809-Speed 11565.61 samples/sec Loss 6.0674 LearningRate 0.0214 Epoch: 21 Global Step: 122140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:11,682-Speed 11729.70 samples/sec Loss 6.0229 LearningRate 0.0214 Epoch: 21 Global Step: 122150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:12,579-Speed 11434.89 samples/sec Loss 5.9255 LearningRate 0.0214 Epoch: 21 Global Step: 122160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:13,486-Speed 11289.04 samples/sec Loss 6.1888 LearningRate 0.0214 Epoch: 21 Global Step: 122170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:14,447-Speed 10669.46 samples/sec Loss 5.9714 LearningRate 0.0214 Epoch: 21 Global Step: 122180 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:15,402-Speed 10732.93 samples/sec Loss 6.1061 LearningRate 0.0214 Epoch: 21 Global Step: 122190 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:16,286-Speed 11584.58 samples/sec Loss 5.8995 LearningRate 0.0214 Epoch: 21 Global Step: 122200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:17,239-Speed 10762.18 samples/sec Loss 6.0382 LearningRate 0.0214 Epoch: 21 Global Step: 122210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:18,158-Speed 11145.61 samples/sec Loss 6.0078 LearningRate 0.0214 Epoch: 21 Global Step: 122220 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:19,042-Speed 11594.17 samples/sec Loss 6.0376 LearningRate 0.0214 Epoch: 21 Global Step: 122230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:19,975-Speed 10980.26 samples/sec Loss 6.0488 LearningRate 0.0214 Epoch: 21 Global Step: 122240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:20,877-Speed 11361.37 samples/sec Loss 5.9702 LearningRate 0.0214 Epoch: 21 Global Step: 122250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:21,758-Speed 11635.33 samples/sec Loss 5.8962 LearningRate 0.0214 Epoch: 21 Global Step: 122260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:22,626-Speed 11806.38 samples/sec Loss 6.0662 LearningRate 0.0214 Epoch: 21 Global Step: 122270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:23,518-Speed 11482.80 samples/sec Loss 6.1032 LearningRate 0.0214 Epoch: 21 Global Step: 122280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:24,457-Speed 10920.22 samples/sec Loss 5.9960 LearningRate 0.0214 Epoch: 21 Global Step: 122290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:25,347-Speed 11514.34 samples/sec Loss 5.9449 LearningRate 0.0214 Epoch: 21 Global Step: 122300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:26,275-Speed 11038.00 samples/sec Loss 5.9947 LearningRate 0.0214 Epoch: 21 Global Step: 122310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:27,171-Speed 11440.22 samples/sec Loss 6.0670 LearningRate 0.0214 Epoch: 21 Global Step: 122320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:28,103-Speed 10995.58 samples/sec Loss 5.9792 LearningRate 0.0214 Epoch: 21 Global Step: 122330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:29,021-Speed 11156.82 samples/sec Loss 5.9625 LearningRate 0.0214 Epoch: 21 Global Step: 122340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:29,964-Speed 10870.77 samples/sec Loss 6.0781 LearningRate 0.0213 Epoch: 21 Global Step: 122350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:30,906-Speed 10871.32 samples/sec Loss 6.1053 LearningRate 0.0213 Epoch: 21 Global Step: 122360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:31,849-Speed 10871.21 samples/sec Loss 6.1385 LearningRate 0.0213 Epoch: 21 Global Step: 122370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:32,779-Speed 11015.04 samples/sec Loss 6.0365 LearningRate 0.0213 Epoch: 21 Global Step: 122380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:33,669-Speed 11509.51 samples/sec Loss 6.1403 LearningRate 0.0213 Epoch: 21 Global Step: 122390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:35:34,532-Speed 11883.31 samples/sec Loss 6.0473 LearningRate 0.0213 Epoch: 21 Global Step: 122400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:35,442-Speed 11260.38 samples/sec Loss 6.0603 LearningRate 0.0213 Epoch: 21 Global Step: 122410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:36,317-Speed 11712.01 samples/sec Loss 6.2009 LearningRate 0.0213 Epoch: 21 Global Step: 122420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:37,193-Speed 11698.74 samples/sec Loss 6.0282 LearningRate 0.0213 Epoch: 21 Global Step: 122430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:38,083-Speed 11515.61 samples/sec Loss 5.9106 LearningRate 0.0213 Epoch: 21 Global Step: 122440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:39,019-Speed 10950.47 samples/sec Loss 6.0227 LearningRate 0.0213 Epoch: 21 Global Step: 122450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:39,931-Speed 11235.52 samples/sec Loss 6.0659 LearningRate 0.0213 Epoch: 21 Global Step: 122460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:40,815-Speed 11585.73 samples/sec Loss 6.0033 LearningRate 0.0213 Epoch: 21 Global Step: 122470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:41,750-Speed 10964.95 samples/sec Loss 6.1116 LearningRate 0.0213 Epoch: 21 Global Step: 122480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:42,654-Speed 11337.76 samples/sec Loss 6.0704 LearningRate 0.0213 Epoch: 21 Global Step: 122490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:43,587-Speed 10984.77 samples/sec Loss 5.9580 LearningRate 0.0213 Epoch: 21 Global Step: 122500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:44,480-Speed 11473.44 samples/sec Loss 6.1108 LearningRate 0.0213 Epoch: 21 Global Step: 122510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:45,363-Speed 11606.15 samples/sec Loss 6.2017 LearningRate 0.0213 Epoch: 21 Global Step: 122520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:46,259-Speed 11428.30 samples/sec Loss 6.0333 LearningRate 0.0213 Epoch: 21 Global Step: 122530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:47,151-Speed 11489.39 samples/sec Loss 6.0080 LearningRate 0.0213 Epoch: 21 Global Step: 122540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:48,051-Speed 11387.43 samples/sec Loss 6.1651 LearningRate 0.0213 Epoch: 21 Global Step: 122550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:48,953-Speed 11365.82 samples/sec Loss 6.0737 LearningRate 0.0213 Epoch: 21 Global Step: 122560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:35:49,861-Speed 11273.85 samples/sec Loss 6.0918 LearningRate 0.0213 Epoch: 21 Global Step: 122570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:50,748-Speed 11562.20 samples/sec Loss 6.0026 LearningRate 0.0213 Epoch: 21 Global Step: 122580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:51,685-Speed 10931.53 samples/sec Loss 6.0732 LearningRate 0.0213 Epoch: 21 Global Step: 122590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:52,604-Speed 11149.33 samples/sec Loss 6.0866 LearningRate 0.0212 Epoch: 21 Global Step: 122600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:53,504-Speed 11386.51 samples/sec Loss 5.9578 LearningRate 0.0212 Epoch: 21 Global Step: 122610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:54,394-Speed 11519.17 samples/sec Loss 6.1060 LearningRate 0.0212 Epoch: 21 Global Step: 122620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:55,277-Speed 11600.50 samples/sec Loss 6.1382 LearningRate 0.0212 Epoch: 21 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:56,164-Speed 11554.39 samples/sec Loss 6.0219 LearningRate 0.0212 Epoch: 21 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:57,042-Speed 11678.82 samples/sec Loss 6.0531 LearningRate 0.0212 Epoch: 21 Global Step: 122650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:57,918-Speed 11688.72 samples/sec Loss 6.0489 LearningRate 0.0212 Epoch: 21 Global Step: 122660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:58,790-Speed 11751.32 samples/sec Loss 5.9450 LearningRate 0.0212 Epoch: 21 Global Step: 122670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:35:59,665-Speed 11716.16 samples/sec Loss 6.0885 LearningRate 0.0212 Epoch: 21 Global Step: 122680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:00,550-Speed 11577.83 samples/sec Loss 5.9983 LearningRate 0.0212 Epoch: 21 Global Step: 122690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:01,453-Speed 11344.54 samples/sec Loss 6.0514 LearningRate 0.0212 Epoch: 21 Global Step: 122700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:02,376-Speed 11104.14 samples/sec Loss 6.1311 LearningRate 0.0212 Epoch: 21 Global Step: 122710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:03,292-Speed 11185.75 samples/sec Loss 6.0815 LearningRate 0.0212 Epoch: 21 Global Step: 122720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:04,221-Speed 11036.29 samples/sec Loss 6.1692 LearningRate 0.0212 Epoch: 21 Global Step: 122730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:05,146-Speed 11070.37 samples/sec Loss 6.0700 LearningRate 0.0212 Epoch: 21 Global Step: 122740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:06,038-Speed 11484.93 samples/sec Loss 5.9700 LearningRate 0.0212 Epoch: 21 Global Step: 122750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:06,973-Speed 10967.84 samples/sec Loss 5.9150 LearningRate 0.0212 Epoch: 21 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:07,883-Speed 11264.75 samples/sec Loss 6.1270 LearningRate 0.0212 Epoch: 21 Global Step: 122770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:08,789-Speed 11305.68 samples/sec Loss 5.9799 LearningRate 0.0212 Epoch: 21 Global Step: 122780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:09,680-Speed 11496.30 samples/sec Loss 6.0776 LearningRate 0.0212 Epoch: 21 Global Step: 122790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:10,565-Speed 11586.27 samples/sec Loss 6.0970 LearningRate 0.0212 Epoch: 21 Global Step: 122800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:11,443-Speed 11663.99 samples/sec Loss 6.1408 LearningRate 0.0212 Epoch: 21 Global Step: 122810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:12,343-Speed 11386.50 samples/sec Loss 5.9583 LearningRate 0.0212 Epoch: 21 Global Step: 122820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:13,231-Speed 11536.08 samples/sec Loss 6.0633 LearningRate 0.0212 Epoch: 21 Global Step: 122830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:14,161-Speed 11022.97 samples/sec Loss 6.1101 LearningRate 0.0212 Epoch: 21 Global Step: 122840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:15,088-Speed 11056.13 samples/sec Loss 5.9443 LearningRate 0.0211 Epoch: 21 Global Step: 122850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:15,980-Speed 11493.46 samples/sec Loss 6.1204 LearningRate 0.0211 Epoch: 21 Global Step: 122860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:16,867-Speed 11547.50 samples/sec Loss 6.1575 LearningRate 0.0211 Epoch: 21 Global Step: 122870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:17,768-Speed 11383.27 samples/sec Loss 5.9769 LearningRate 0.0211 Epoch: 21 Global Step: 122880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:18,720-Speed 10765.04 samples/sec Loss 6.0675 LearningRate 0.0211 Epoch: 21 Global Step: 122890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:19,627-Speed 11296.12 samples/sec Loss 5.9262 LearningRate 0.0211 Epoch: 21 Global Step: 122900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:20,549-Speed 11109.17 samples/sec Loss 6.3355 LearningRate 0.0211 Epoch: 21 Global Step: 122910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:21,418-Speed 11784.97 samples/sec Loss 6.0291 LearningRate 0.0211 Epoch: 21 Global Step: 122920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:22,299-Speed 11640.09 samples/sec Loss 5.9972 LearningRate 0.0211 Epoch: 21 Global Step: 122930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:23,201-Speed 11357.64 samples/sec Loss 5.9988 LearningRate 0.0211 Epoch: 21 Global Step: 122940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:24,086-Speed 11583.80 samples/sec Loss 6.0503 LearningRate 0.0211 Epoch: 21 Global Step: 122950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:25,005-Speed 11146.91 samples/sec Loss 6.0234 LearningRate 0.0211 Epoch: 21 Global Step: 122960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:25,930-Speed 11077.11 samples/sec Loss 6.0490 LearningRate 0.0211 Epoch: 21 Global Step: 122970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:26,805-Speed 11707.34 samples/sec Loss 6.1964 LearningRate 0.0211 Epoch: 21 Global Step: 122980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:27,686-Speed 11634.09 samples/sec Loss 6.0721 LearningRate 0.0211 Epoch: 21 Global Step: 122990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:28,633-Speed 10828.90 samples/sec Loss 6.1565 LearningRate 0.0211 Epoch: 21 Global Step: 123000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:29,559-Speed 11063.78 samples/sec Loss 6.0835 LearningRate 0.0211 Epoch: 21 Global Step: 123010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:30,498-Speed 10911.52 samples/sec Loss 6.1511 LearningRate 0.0211 Epoch: 21 Global Step: 123020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:31,461-Speed 10631.05 samples/sec Loss 6.0657 LearningRate 0.0211 Epoch: 21 Global Step: 123030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:32,360-Speed 11405.68 samples/sec Loss 6.0242 LearningRate 0.0211 Epoch: 21 Global Step: 123040 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:33,249-Speed 11530.63 samples/sec Loss 5.9967 LearningRate 0.0211 Epoch: 21 Global Step: 123050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:34,151-Speed 11354.03 samples/sec Loss 6.0861 LearningRate 0.0211 Epoch: 21 Global Step: 123060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:35,072-Speed 11126.10 samples/sec Loss 6.1009 LearningRate 0.0211 Epoch: 21 Global Step: 123070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:35,985-Speed 11221.51 samples/sec Loss 6.0793 LearningRate 0.0211 Epoch: 21 Global Step: 123080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:36,875-Speed 11520.04 samples/sec Loss 6.0754 LearningRate 0.0210 Epoch: 21 Global Step: 123090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:37,778-Speed 11355.01 samples/sec Loss 6.1325 LearningRate 0.0210 Epoch: 21 Global Step: 123100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:38,708-Speed 11010.85 samples/sec Loss 6.0697 LearningRate 0.0210 Epoch: 21 Global Step: 123110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:39,620-Speed 11246.06 samples/sec Loss 6.1010 LearningRate 0.0210 Epoch: 21 Global Step: 123120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:40,538-Speed 11163.63 samples/sec Loss 6.0613 LearningRate 0.0210 Epoch: 21 Global Step: 123130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:41,441-Speed 11340.41 samples/sec Loss 5.8655 LearningRate 0.0210 Epoch: 21 Global Step: 123140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:42,339-Speed 11413.29 samples/sec Loss 6.1792 LearningRate 0.0210 Epoch: 21 Global Step: 123150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:43,254-Speed 11204.62 samples/sec Loss 6.0471 LearningRate 0.0210 Epoch: 21 Global Step: 123160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:44,152-Speed 11415.00 samples/sec Loss 5.9819 LearningRate 0.0210 Epoch: 21 Global Step: 123170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:45,030-Speed 11673.27 samples/sec Loss 6.0854 LearningRate 0.0210 Epoch: 21 Global Step: 123180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:45,915-Speed 11572.07 samples/sec Loss 6.1900 LearningRate 0.0210 Epoch: 21 Global Step: 123190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:46,799-Speed 11588.96 samples/sec Loss 5.9711 LearningRate 0.0210 Epoch: 21 Global Step: 123200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:47,698-Speed 11404.83 samples/sec Loss 5.9710 LearningRate 0.0210 Epoch: 21 Global Step: 123210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:48,574-Speed 11691.68 samples/sec Loss 5.8933 LearningRate 0.0210 Epoch: 21 Global Step: 123220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:49,473-Speed 11405.34 samples/sec Loss 6.0502 LearningRate 0.0210 Epoch: 21 Global Step: 123230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:50,371-Speed 11407.76 samples/sec Loss 5.9991 LearningRate 0.0210 Epoch: 21 Global Step: 123240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:51,272-Speed 11374.94 samples/sec Loss 6.0274 LearningRate 0.0210 Epoch: 21 Global Step: 123250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:52,169-Speed 11418.37 samples/sec Loss 6.1157 LearningRate 0.0210 Epoch: 21 Global Step: 123260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:36:53,063-Speed 11461.69 samples/sec Loss 6.1027 LearningRate 0.0210 Epoch: 21 Global Step: 123270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:53,960-Speed 11426.23 samples/sec Loss 6.2222 LearningRate 0.0210 Epoch: 21 Global Step: 123280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:54,855-Speed 11457.65 samples/sec Loss 6.0870 LearningRate 0.0210 Epoch: 21 Global Step: 123290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:55,741-Speed 11560.26 samples/sec Loss 6.1823 LearningRate 0.0210 Epoch: 21 Global Step: 123300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:56,655-Speed 11216.85 samples/sec Loss 6.1099 LearningRate 0.0210 Epoch: 21 Global Step: 123310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:57,555-Speed 11382.90 samples/sec Loss 6.0755 LearningRate 0.0210 Epoch: 21 Global Step: 123320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:58,444-Speed 11523.15 samples/sec Loss 6.1837 LearningRate 0.0210 Epoch: 21 Global Step: 123330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:36:59,334-Speed 11513.27 samples/sec Loss 6.0785 LearningRate 0.0209 Epoch: 21 Global Step: 123340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:00,223-Speed 11533.43 samples/sec Loss 6.1925 LearningRate 0.0209 Epoch: 21 Global Step: 123350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:01,158-Speed 10960.59 samples/sec Loss 5.9892 LearningRate 0.0209 Epoch: 21 Global Step: 123360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:02,089-Speed 11007.61 samples/sec Loss 6.1593 LearningRate 0.0209 Epoch: 21 Global Step: 123370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:03,040-Speed 10777.01 samples/sec Loss 6.0223 LearningRate 0.0209 Epoch: 21 Global Step: 123380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:03,976-Speed 10965.58 samples/sec Loss 6.2380 LearningRate 0.0209 Epoch: 21 Global Step: 123390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:04,876-Speed 11380.37 samples/sec Loss 6.1212 LearningRate 0.0209 Epoch: 21 Global Step: 123400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:05,772-Speed 11435.98 samples/sec Loss 6.0368 LearningRate 0.0209 Epoch: 21 Global Step: 123410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:06,676-Speed 11338.45 samples/sec Loss 6.1481 LearningRate 0.0209 Epoch: 21 Global Step: 123420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:07,576-Speed 11385.03 samples/sec Loss 6.2855 LearningRate 0.0209 Epoch: 21 Global Step: 123430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:08,462-Speed 11571.05 samples/sec Loss 5.9754 LearningRate 0.0209 Epoch: 21 Global Step: 123440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:09,375-Speed 11216.46 samples/sec Loss 6.1282 LearningRate 0.0209 Epoch: 21 Global Step: 123450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:10,277-Speed 11362.12 samples/sec Loss 6.0795 LearningRate 0.0209 Epoch: 21 Global Step: 123460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:11,193-Speed 11185.64 samples/sec Loss 5.9878 LearningRate 0.0209 Epoch: 21 Global Step: 123470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:12,111-Speed 11165.90 samples/sec Loss 6.2094 LearningRate 0.0209 Epoch: 21 Global Step: 123480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:12,980-Speed 11799.97 samples/sec Loss 6.2214 LearningRate 0.0209 Epoch: 21 Global Step: 123490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:13,906-Speed 11063.15 samples/sec Loss 6.1787 LearningRate 0.0209 Epoch: 21 Global Step: 123500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:14,863-Speed 10713.00 samples/sec Loss 6.1868 LearningRate 0.0209 Epoch: 21 Global Step: 123510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:15,803-Speed 10900.40 samples/sec Loss 6.0893 LearningRate 0.0209 Epoch: 21 Global Step: 123520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:16,732-Speed 11026.36 samples/sec Loss 6.1738 LearningRate 0.0209 Epoch: 21 Global Step: 123530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:17,660-Speed 11046.60 samples/sec Loss 6.1071 LearningRate 0.0209 Epoch: 21 Global Step: 123540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:18,609-Speed 10801.42 samples/sec Loss 6.1818 LearningRate 0.0209 Epoch: 21 Global Step: 123550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:19,527-Speed 11159.20 samples/sec Loss 6.1312 LearningRate 0.0209 Epoch: 21 Global Step: 123560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:20,437-Speed 11255.67 samples/sec Loss 6.0599 LearningRate 0.0209 Epoch: 21 Global Step: 123570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:21,316-Speed 11655.81 samples/sec Loss 6.1187 LearningRate 0.0209 Epoch: 21 Global Step: 123580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:22,196-Speed 11649.09 samples/sec Loss 6.1547 LearningRate 0.0208 Epoch: 21 Global Step: 123590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:23,074-Speed 11669.22 samples/sec Loss 6.0165 LearningRate 0.0208 Epoch: 21 Global Step: 123600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:24,011-Speed 10940.38 samples/sec Loss 5.8447 LearningRate 0.0208 Epoch: 21 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:24,892-Speed 11627.71 samples/sec Loss 6.0869 LearningRate 0.0208 Epoch: 21 Global Step: 123620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:25,814-Speed 11112.50 samples/sec Loss 6.0375 LearningRate 0.0208 Epoch: 21 Global Step: 123630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:26,748-Speed 10972.01 samples/sec Loss 5.9807 LearningRate 0.0208 Epoch: 21 Global Step: 123640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:27,684-Speed 10954.34 samples/sec Loss 6.1474 LearningRate 0.0208 Epoch: 21 Global Step: 123650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:28,637-Speed 10753.57 samples/sec Loss 6.0346 LearningRate 0.0208 Epoch: 21 Global Step: 123660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:29,564-Speed 11051.97 samples/sec Loss 6.1151 LearningRate 0.0208 Epoch: 21 Global Step: 123670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:30,520-Speed 10722.67 samples/sec Loss 6.0578 LearningRate 0.0208 Epoch: 21 Global Step: 123680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:31,452-Speed 10986.94 samples/sec Loss 5.9989 LearningRate 0.0208 Epoch: 21 Global Step: 123690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:32,363-Speed 11264.37 samples/sec Loss 6.0700 LearningRate 0.0208 Epoch: 21 Global Step: 123700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:33,267-Speed 11343.72 samples/sec Loss 6.0450 LearningRate 0.0208 Epoch: 21 Global Step: 123710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:34,156-Speed 11517.50 samples/sec Loss 5.9452 LearningRate 0.0208 Epoch: 21 Global Step: 123720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:35,051-Speed 11454.14 samples/sec Loss 6.0067 LearningRate 0.0208 Epoch: 21 Global Step: 123730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:36,012-Speed 10662.46 samples/sec Loss 5.9803 LearningRate 0.0208 Epoch: 21 Global Step: 123740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:36,913-Speed 11367.01 samples/sec Loss 6.0121 LearningRate 0.0208 Epoch: 21 Global Step: 123750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:37,808-Speed 11462.92 samples/sec Loss 6.1540 LearningRate 0.0208 Epoch: 21 Global Step: 123760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:38,699-Speed 11497.88 samples/sec Loss 6.0812 LearningRate 0.0208 Epoch: 21 Global Step: 123770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:39,679-Speed 10471.25 samples/sec Loss 6.1579 LearningRate 0.0208 Epoch: 21 Global Step: 123780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:40,636-Speed 10703.28 samples/sec Loss 6.0900 LearningRate 0.0208 Epoch: 21 Global Step: 123790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:41,540-Speed 11329.97 samples/sec Loss 5.9960 LearningRate 0.0208 Epoch: 21 Global Step: 123800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:42,425-Speed 11585.92 samples/sec Loss 5.9458 LearningRate 0.0208 Epoch: 21 Global Step: 123810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:43,319-Speed 11458.00 samples/sec Loss 6.0360 LearningRate 0.0208 Epoch: 21 Global Step: 123820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:44,231-Speed 11246.69 samples/sec Loss 6.1167 LearningRate 0.0208 Epoch: 21 Global Step: 123830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:45,135-Speed 11327.71 samples/sec Loss 6.1716 LearningRate 0.0207 Epoch: 21 Global Step: 123840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:46,011-Speed 11701.11 samples/sec Loss 6.0303 LearningRate 0.0207 Epoch: 21 Global Step: 123850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:46,901-Speed 11510.07 samples/sec Loss 6.1348 LearningRate 0.0207 Epoch: 21 Global Step: 123860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:47,790-Speed 11527.72 samples/sec Loss 6.1102 LearningRate 0.0207 Epoch: 21 Global Step: 123870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:48,691-Speed 11376.11 samples/sec Loss 6.0406 LearningRate 0.0207 Epoch: 21 Global Step: 123880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:49,607-Speed 11180.03 samples/sec Loss 6.0825 LearningRate 0.0207 Epoch: 21 Global Step: 123890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:50,547-Speed 10911.29 samples/sec Loss 6.2264 LearningRate 0.0207 Epoch: 21 Global Step: 123900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:51,485-Speed 10915.25 samples/sec Loss 6.0543 LearningRate 0.0207 Epoch: 21 Global Step: 123910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:52,402-Speed 11178.69 samples/sec Loss 6.0584 LearningRate 0.0207 Epoch: 21 Global Step: 123920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:53,331-Speed 11065.94 samples/sec Loss 6.0942 LearningRate 0.0207 Epoch: 21 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:54,233-Speed 11355.13 samples/sec Loss 6.0926 LearningRate 0.0207 Epoch: 21 Global Step: 123940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:55,161-Speed 11040.47 samples/sec Loss 6.1463 LearningRate 0.0207 Epoch: 21 Global Step: 123950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:37:56,078-Speed 11173.09 samples/sec Loss 6.0339 LearningRate 0.0207 Epoch: 21 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:56,943-Speed 11853.21 samples/sec Loss 6.1191 LearningRate 0.0207 Epoch: 21 Global Step: 123970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:57,828-Speed 11584.98 samples/sec Loss 6.0810 LearningRate 0.0207 Epoch: 21 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:58,741-Speed 11220.22 samples/sec Loss 6.2529 LearningRate 0.0207 Epoch: 21 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:37:59,655-Speed 11217.05 samples/sec Loss 6.0974 LearningRate 0.0207 Epoch: 21 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:38:21,071-[lfw][124000]XNorm: 9.951953 Training: 2022-04-11 03:38:21,071-[lfw][124000]Accuracy-Flip: 0.99650+-0.00263 Training: 2022-04-11 03:38:21,072-[lfw][124000]Accuracy-Highest: 0.99650 Training: 2022-04-11 03:38:45,804-[cfp_fp][124000]XNorm: 8.399276 Training: 2022-04-11 03:38:45,804-[cfp_fp][124000]Accuracy-Flip: 0.94671+-0.00975 Training: 2022-04-11 03:38:45,805-[cfp_fp][124000]Accuracy-Highest: 0.94814 Training: 2022-04-11 03:39:07,018-[agedb_30][124000]XNorm: 9.767665 Training: 2022-04-11 03:39:07,018-[agedb_30][124000]Accuracy-Flip: 0.96200+-0.00977 Training: 2022-04-11 03:39:07,019-[agedb_30][124000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:39:07,926-Speed 149.99 samples/sec Loss 6.1177 LearningRate 0.0207 Epoch: 21 Global Step: 124010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:08,813-Speed 11547.92 samples/sec Loss 6.1239 LearningRate 0.0207 Epoch: 21 Global Step: 124020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:09,715-Speed 11358.36 samples/sec Loss 5.9560 LearningRate 0.0207 Epoch: 21 Global Step: 124030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:10,605-Speed 11510.10 samples/sec Loss 6.0102 LearningRate 0.0207 Epoch: 21 Global Step: 124040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:11,489-Speed 11594.05 samples/sec Loss 5.8139 LearningRate 0.0207 Epoch: 21 Global Step: 124050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:12,388-Speed 11398.91 samples/sec Loss 6.0337 LearningRate 0.0207 Epoch: 21 Global Step: 124060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:39:13,275-Speed 11552.55 samples/sec Loss 6.0067 LearningRate 0.0207 Epoch: 21 Global Step: 124070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:39:14,151-Speed 11703.38 samples/sec Loss 5.9675 LearningRate 0.0207 Epoch: 21 Global Step: 124080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:39:15,031-Speed 11633.87 samples/sec Loss 6.0810 LearningRate 0.0206 Epoch: 21 Global Step: 124090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:15,904-Speed 11747.10 samples/sec Loss 5.9727 LearningRate 0.0206 Epoch: 21 Global Step: 124100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:16,788-Speed 11590.45 samples/sec Loss 5.9663 LearningRate 0.0206 Epoch: 21 Global Step: 124110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:17,670-Speed 11621.87 samples/sec Loss 6.0392 LearningRate 0.0206 Epoch: 21 Global Step: 124120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:18,557-Speed 11547.76 samples/sec Loss 6.0587 LearningRate 0.0206 Epoch: 21 Global Step: 124130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:19,439-Speed 11621.21 samples/sec Loss 6.0158 LearningRate 0.0206 Epoch: 21 Global Step: 124140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:20,323-Speed 11588.95 samples/sec Loss 6.0349 LearningRate 0.0206 Epoch: 21 Global Step: 124150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:21,208-Speed 11585.68 samples/sec Loss 6.0076 LearningRate 0.0206 Epoch: 21 Global Step: 124160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:22,088-Speed 11641.33 samples/sec Loss 6.0937 LearningRate 0.0206 Epoch: 21 Global Step: 124170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:22,972-Speed 11589.75 samples/sec Loss 6.0568 LearningRate 0.0206 Epoch: 21 Global Step: 124180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:23,870-Speed 11415.11 samples/sec Loss 6.0540 LearningRate 0.0206 Epoch: 21 Global Step: 124190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:24,778-Speed 11283.06 samples/sec Loss 6.1555 LearningRate 0.0206 Epoch: 21 Global Step: 124200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:25,689-Speed 11244.49 samples/sec Loss 6.0055 LearningRate 0.0206 Epoch: 21 Global Step: 124210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:26,621-Speed 11005.48 samples/sec Loss 6.0071 LearningRate 0.0206 Epoch: 21 Global Step: 124220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:27,524-Speed 11340.42 samples/sec Loss 6.0546 LearningRate 0.0206 Epoch: 21 Global Step: 124230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:28,397-Speed 11739.00 samples/sec Loss 6.0938 LearningRate 0.0206 Epoch: 21 Global Step: 124240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:29,271-Speed 11730.75 samples/sec Loss 6.0618 LearningRate 0.0206 Epoch: 21 Global Step: 124250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:30,213-Speed 10881.92 samples/sec Loss 5.9846 LearningRate 0.0206 Epoch: 21 Global Step: 124260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:31,112-Speed 11398.88 samples/sec Loss 6.0688 LearningRate 0.0206 Epoch: 21 Global Step: 124270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:32,046-Speed 10966.04 samples/sec Loss 6.1407 LearningRate 0.0206 Epoch: 21 Global Step: 124280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:32,927-Speed 11633.06 samples/sec Loss 6.1747 LearningRate 0.0206 Epoch: 21 Global Step: 124290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:39:33,821-Speed 11468.60 samples/sec Loss 6.0434 LearningRate 0.0206 Epoch: 21 Global Step: 124300 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:39:34,690-Speed 11788.50 samples/sec Loss 5.9959 LearningRate 0.0206 Epoch: 21 Global Step: 124310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:39:35,591-Speed 11373.52 samples/sec Loss 6.1698 LearningRate 0.0206 Epoch: 21 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:36,470-Speed 11655.45 samples/sec Loss 6.0576 LearningRate 0.0206 Epoch: 21 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:37,381-Speed 11244.35 samples/sec Loss 6.0844 LearningRate 0.0205 Epoch: 21 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:38,260-Speed 11659.72 samples/sec Loss 6.0086 LearningRate 0.0205 Epoch: 21 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:39,143-Speed 11599.97 samples/sec Loss 6.0463 LearningRate 0.0205 Epoch: 21 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:40,048-Speed 11322.38 samples/sec Loss 6.0036 LearningRate 0.0205 Epoch: 21 Global Step: 124370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:40,917-Speed 11792.37 samples/sec Loss 6.1221 LearningRate 0.0205 Epoch: 21 Global Step: 124380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:41,794-Speed 11697.91 samples/sec Loss 5.9027 LearningRate 0.0205 Epoch: 21 Global Step: 124390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:42,700-Speed 11300.78 samples/sec Loss 6.1405 LearningRate 0.0205 Epoch: 21 Global Step: 124400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:43,644-Speed 10860.60 samples/sec Loss 6.0834 LearningRate 0.0205 Epoch: 21 Global Step: 124410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:44,514-Speed 11778.02 samples/sec Loss 6.1301 LearningRate 0.0205 Epoch: 21 Global Step: 124420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:45,412-Speed 11409.03 samples/sec Loss 6.1310 LearningRate 0.0205 Epoch: 21 Global Step: 124430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:46,285-Speed 11738.57 samples/sec Loss 6.0952 LearningRate 0.0205 Epoch: 21 Global Step: 124440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:47,161-Speed 11698.08 samples/sec Loss 5.9758 LearningRate 0.0205 Epoch: 21 Global Step: 124450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:48,046-Speed 11581.09 samples/sec Loss 5.9754 LearningRate 0.0205 Epoch: 21 Global Step: 124460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:48,980-Speed 10968.79 samples/sec Loss 5.9666 LearningRate 0.0205 Epoch: 21 Global Step: 124470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:49,884-Speed 11328.43 samples/sec Loss 6.0736 LearningRate 0.0205 Epoch: 21 Global Step: 124480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:50,773-Speed 11540.93 samples/sec Loss 6.1006 LearningRate 0.0205 Epoch: 21 Global Step: 124490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:51,649-Speed 11697.79 samples/sec Loss 6.0077 LearningRate 0.0205 Epoch: 21 Global Step: 124500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:52,549-Speed 11381.95 samples/sec Loss 6.1051 LearningRate 0.0205 Epoch: 21 Global Step: 124510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:53,430-Speed 11640.20 samples/sec Loss 6.1560 LearningRate 0.0205 Epoch: 21 Global Step: 124520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:39:54,329-Speed 11392.85 samples/sec Loss 6.0481 LearningRate 0.0205 Epoch: 21 Global Step: 124530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:55,228-Speed 11399.91 samples/sec Loss 6.0152 LearningRate 0.0205 Epoch: 21 Global Step: 124540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:56,122-Speed 11459.81 samples/sec Loss 5.9725 LearningRate 0.0205 Epoch: 21 Global Step: 124550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:57,007-Speed 11584.92 samples/sec Loss 6.1173 LearningRate 0.0205 Epoch: 21 Global Step: 124560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:57,877-Speed 11779.88 samples/sec Loss 6.2209 LearningRate 0.0205 Epoch: 21 Global Step: 124570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:58,754-Speed 11681.11 samples/sec Loss 5.9339 LearningRate 0.0205 Epoch: 21 Global Step: 124580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:39:59,678-Speed 11087.02 samples/sec Loss 6.1296 LearningRate 0.0204 Epoch: 21 Global Step: 124590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:00,587-Speed 11271.03 samples/sec Loss 6.2656 LearningRate 0.0204 Epoch: 21 Global Step: 124600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:01,503-Speed 11191.26 samples/sec Loss 6.0546 LearningRate 0.0204 Epoch: 21 Global Step: 124610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:02,398-Speed 11444.78 samples/sec Loss 6.0900 LearningRate 0.0204 Epoch: 21 Global Step: 124620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:03,318-Speed 11141.67 samples/sec Loss 6.1734 LearningRate 0.0204 Epoch: 21 Global Step: 124630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:04,220-Speed 11360.98 samples/sec Loss 6.1896 LearningRate 0.0204 Epoch: 21 Global Step: 124640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:05,130-Speed 11267.47 samples/sec Loss 6.2432 LearningRate 0.0204 Epoch: 21 Global Step: 124650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:06,035-Speed 11318.55 samples/sec Loss 6.1208 LearningRate 0.0204 Epoch: 21 Global Step: 124660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:06,921-Speed 11568.46 samples/sec Loss 6.1641 LearningRate 0.0204 Epoch: 21 Global Step: 124670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:07,819-Speed 11401.42 samples/sec Loss 6.1177 LearningRate 0.0204 Epoch: 21 Global Step: 124680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:08,698-Speed 11667.42 samples/sec Loss 6.0932 LearningRate 0.0204 Epoch: 21 Global Step: 124690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:09,578-Speed 11643.06 samples/sec Loss 6.1197 LearningRate 0.0204 Epoch: 21 Global Step: 124700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:10,461-Speed 11601.73 samples/sec Loss 6.2345 LearningRate 0.0204 Epoch: 21 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:11,357-Speed 11446.33 samples/sec Loss 6.0639 LearningRate 0.0204 Epoch: 21 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:12,247-Speed 11504.73 samples/sec Loss 6.0926 LearningRate 0.0204 Epoch: 21 Global Step: 124730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:40:13,170-Speed 11101.89 samples/sec Loss 6.0571 LearningRate 0.0204 Epoch: 21 Global Step: 124740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:40:14,092-Speed 11125.30 samples/sec Loss 6.1941 LearningRate 0.0204 Epoch: 21 Global Step: 124750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:15,052-Speed 10671.74 samples/sec Loss 6.0124 LearningRate 0.0204 Epoch: 21 Global Step: 124760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:15,973-Speed 11121.16 samples/sec Loss 6.1425 LearningRate 0.0204 Epoch: 21 Global Step: 124770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:16,899-Speed 11071.55 samples/sec Loss 6.0986 LearningRate 0.0204 Epoch: 21 Global Step: 124780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:17,861-Speed 10649.75 samples/sec Loss 6.1383 LearningRate 0.0204 Epoch: 21 Global Step: 124790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:18,783-Speed 11109.58 samples/sec Loss 6.0499 LearningRate 0.0204 Epoch: 21 Global Step: 124800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:19,750-Speed 10601.48 samples/sec Loss 6.0352 LearningRate 0.0204 Epoch: 21 Global Step: 124810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:20,640-Speed 11515.24 samples/sec Loss 6.0770 LearningRate 0.0204 Epoch: 21 Global Step: 124820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:21,519-Speed 11651.36 samples/sec Loss 6.0712 LearningRate 0.0204 Epoch: 21 Global Step: 124830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:22,421-Speed 11358.37 samples/sec Loss 6.1842 LearningRate 0.0203 Epoch: 21 Global Step: 124840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:23,297-Speed 11696.31 samples/sec Loss 6.1692 LearningRate 0.0203 Epoch: 21 Global Step: 124850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:24,196-Speed 11406.69 samples/sec Loss 6.0611 LearningRate 0.0203 Epoch: 21 Global Step: 124860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:25,078-Speed 11610.27 samples/sec Loss 6.0600 LearningRate 0.0203 Epoch: 21 Global Step: 124870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:26,027-Speed 10800.45 samples/sec Loss 5.9691 LearningRate 0.0203 Epoch: 21 Global Step: 124880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:26,933-Speed 11313.82 samples/sec Loss 6.1738 LearningRate 0.0203 Epoch: 21 Global Step: 124890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:27,831-Speed 11411.57 samples/sec Loss 6.1204 LearningRate 0.0203 Epoch: 21 Global Step: 124900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:28,779-Speed 10806.60 samples/sec Loss 6.1619 LearningRate 0.0203 Epoch: 21 Global Step: 124910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:29,671-Speed 11488.84 samples/sec Loss 6.1308 LearningRate 0.0203 Epoch: 21 Global Step: 124920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:30,573-Speed 11361.82 samples/sec Loss 6.0964 LearningRate 0.0203 Epoch: 21 Global Step: 124930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:31,500-Speed 11050.12 samples/sec Loss 6.0387 LearningRate 0.0203 Epoch: 21 Global Step: 124940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:32,427-Speed 11051.01 samples/sec Loss 6.1081 LearningRate 0.0203 Epoch: 21 Global Step: 124950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:40:33,312-Speed 11578.30 samples/sec Loss 6.0109 LearningRate 0.0203 Epoch: 21 Global Step: 124960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:34,186-Speed 11741.15 samples/sec Loss 6.0279 LearningRate 0.0203 Epoch: 21 Global Step: 124970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:35,075-Speed 11520.73 samples/sec Loss 6.1545 LearningRate 0.0203 Epoch: 21 Global Step: 124980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:35,946-Speed 11765.09 samples/sec Loss 6.0834 LearningRate 0.0203 Epoch: 21 Global Step: 124990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:36,846-Speed 11389.99 samples/sec Loss 6.0039 LearningRate 0.0203 Epoch: 21 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:37,781-Speed 10959.47 samples/sec Loss 6.1692 LearningRate 0.0203 Epoch: 21 Global Step: 125010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:38,690-Speed 11274.37 samples/sec Loss 6.2506 LearningRate 0.0203 Epoch: 21 Global Step: 125020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:39,594-Speed 11335.27 samples/sec Loss 6.1527 LearningRate 0.0203 Epoch: 21 Global Step: 125030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:40:40,499-Speed 11323.65 samples/sec Loss 6.1898 LearningRate 0.0203 Epoch: 21 Global Step: 125040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:41,429-Speed 11065.36 samples/sec Loss 6.2140 LearningRate 0.0203 Epoch: 21 Global Step: 125050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:42,344-Speed 11195.95 samples/sec Loss 6.0192 LearningRate 0.0203 Epoch: 21 Global Step: 125060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:43,291-Speed 10817.06 samples/sec Loss 6.0337 LearningRate 0.0203 Epoch: 21 Global Step: 125070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:44,177-Speed 11562.73 samples/sec Loss 6.0384 LearningRate 0.0203 Epoch: 21 Global Step: 125080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:45,131-Speed 10747.54 samples/sec Loss 6.0712 LearningRate 0.0203 Epoch: 21 Global Step: 125090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:55,731-Speed 966.13 samples/sec Loss 5.6929 LearningRate 0.0202 Epoch: 22 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:56,921-Speed 8610.42 samples/sec Loss 5.4799 LearningRate 0.0202 Epoch: 22 Global Step: 125110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:57,882-Speed 10667.47 samples/sec Loss 5.4767 LearningRate 0.0202 Epoch: 22 Global Step: 125120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:58,979-Speed 9338.51 samples/sec Loss 5.4866 LearningRate 0.0202 Epoch: 22 Global Step: 125130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:40:59,924-Speed 10845.93 samples/sec Loss 5.4596 LearningRate 0.0202 Epoch: 22 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:00,858-Speed 10972.06 samples/sec Loss 5.4750 LearningRate 0.0202 Epoch: 22 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:01,804-Speed 10834.19 samples/sec Loss 5.4159 LearningRate 0.0202 Epoch: 22 Global Step: 125160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:02,699-Speed 11450.89 samples/sec Loss 5.4406 LearningRate 0.0202 Epoch: 22 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:03,572-Speed 11739.92 samples/sec Loss 5.3538 LearningRate 0.0202 Epoch: 22 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:04,506-Speed 10966.40 samples/sec Loss 5.3202 LearningRate 0.0202 Epoch: 22 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:05,384-Speed 11670.51 samples/sec Loss 5.3962 LearningRate 0.0202 Epoch: 22 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:06,327-Speed 10866.42 samples/sec Loss 5.4579 LearningRate 0.0202 Epoch: 22 Global Step: 125210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:07,279-Speed 10766.45 samples/sec Loss 5.3685 LearningRate 0.0202 Epoch: 22 Global Step: 125220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:08,188-Speed 11276.90 samples/sec Loss 5.4353 LearningRate 0.0202 Epoch: 22 Global Step: 125230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:09,097-Speed 11276.34 samples/sec Loss 5.4281 LearningRate 0.0202 Epoch: 22 Global Step: 125240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:09,963-Speed 11827.74 samples/sec Loss 5.3030 LearningRate 0.0202 Epoch: 22 Global Step: 125250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:10,863-Speed 11392.31 samples/sec Loss 5.4838 LearningRate 0.0202 Epoch: 22 Global Step: 125260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:11,765-Speed 11361.41 samples/sec Loss 5.4241 LearningRate 0.0202 Epoch: 22 Global Step: 125270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:12,711-Speed 10831.05 samples/sec Loss 5.3947 LearningRate 0.0202 Epoch: 22 Global Step: 125280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:13,623-Speed 11232.83 samples/sec Loss 5.4247 LearningRate 0.0202 Epoch: 22 Global Step: 125290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:14,545-Speed 11116.57 samples/sec Loss 5.4307 LearningRate 0.0202 Epoch: 22 Global Step: 125300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:15,423-Speed 11675.30 samples/sec Loss 5.5706 LearningRate 0.0202 Epoch: 22 Global Step: 125310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:16,296-Speed 11738.54 samples/sec Loss 5.5388 LearningRate 0.0202 Epoch: 22 Global Step: 125320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:17,180-Speed 11582.53 samples/sec Loss 5.5310 LearningRate 0.0202 Epoch: 22 Global Step: 125330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:18,066-Speed 11577.53 samples/sec Loss 5.4942 LearningRate 0.0202 Epoch: 22 Global Step: 125340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:19,002-Speed 10938.68 samples/sec Loss 5.5467 LearningRate 0.0201 Epoch: 22 Global Step: 125350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:19,925-Speed 11103.23 samples/sec Loss 5.5269 LearningRate 0.0201 Epoch: 22 Global Step: 125360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:20,818-Speed 11479.07 samples/sec Loss 5.4543 LearningRate 0.0201 Epoch: 22 Global Step: 125370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:21,734-Speed 11182.08 samples/sec Loss 5.5186 LearningRate 0.0201 Epoch: 22 Global Step: 125380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:22,667-Speed 10990.04 samples/sec Loss 5.5368 LearningRate 0.0201 Epoch: 22 Global Step: 125390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:23,568-Speed 11381.19 samples/sec Loss 5.5869 LearningRate 0.0201 Epoch: 22 Global Step: 125400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:24,482-Speed 11210.61 samples/sec Loss 5.4520 LearningRate 0.0201 Epoch: 22 Global Step: 125410 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:25,368-Speed 11559.39 samples/sec Loss 5.4417 LearningRate 0.0201 Epoch: 22 Global Step: 125420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:26,265-Speed 11430.68 samples/sec Loss 5.4876 LearningRate 0.0201 Epoch: 22 Global Step: 125430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:27,162-Speed 11427.39 samples/sec Loss 5.4690 LearningRate 0.0201 Epoch: 22 Global Step: 125440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:28,047-Speed 11566.46 samples/sec Loss 5.7299 LearningRate 0.0201 Epoch: 22 Global Step: 125450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:29,006-Speed 10694.30 samples/sec Loss 5.4578 LearningRate 0.0201 Epoch: 22 Global Step: 125460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:29,951-Speed 10834.34 samples/sec Loss 5.4080 LearningRate 0.0201 Epoch: 22 Global Step: 125470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:30,877-Speed 11065.75 samples/sec Loss 5.5926 LearningRate 0.0201 Epoch: 22 Global Step: 125480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:31,779-Speed 11368.98 samples/sec Loss 5.5145 LearningRate 0.0201 Epoch: 22 Global Step: 125490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:32,697-Speed 11157.61 samples/sec Loss 5.7142 LearningRate 0.0201 Epoch: 22 Global Step: 125500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:33,588-Speed 11504.75 samples/sec Loss 5.5197 LearningRate 0.0201 Epoch: 22 Global Step: 125510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:34,466-Speed 11669.89 samples/sec Loss 5.5217 LearningRate 0.0201 Epoch: 22 Global Step: 125520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:35,373-Speed 11296.55 samples/sec Loss 5.4688 LearningRate 0.0201 Epoch: 22 Global Step: 125530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:36,410-Speed 9883.88 samples/sec Loss 5.4915 LearningRate 0.0201 Epoch: 22 Global Step: 125540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:37,301-Speed 11494.25 samples/sec Loss 5.6066 LearningRate 0.0201 Epoch: 22 Global Step: 125550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:38,248-Speed 10817.31 samples/sec Loss 5.5943 LearningRate 0.0201 Epoch: 22 Global Step: 125560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:39,134-Speed 11579.55 samples/sec Loss 5.6726 LearningRate 0.0201 Epoch: 22 Global Step: 125570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:40,085-Speed 10776.01 samples/sec Loss 5.3997 LearningRate 0.0201 Epoch: 22 Global Step: 125580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:40,970-Speed 11579.82 samples/sec Loss 5.8051 LearningRate 0.0201 Epoch: 22 Global Step: 125590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:41,853-Speed 11604.63 samples/sec Loss 5.5436 LearningRate 0.0200 Epoch: 22 Global Step: 125600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:42,882-Speed 9957.95 samples/sec Loss 5.5826 LearningRate 0.0200 Epoch: 22 Global Step: 125610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:43,793-Speed 11244.72 samples/sec Loss 5.5600 LearningRate 0.0200 Epoch: 22 Global Step: 125620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:44,677-Speed 11587.95 samples/sec Loss 5.5145 LearningRate 0.0200 Epoch: 22 Global Step: 125630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:45,564-Speed 11552.05 samples/sec Loss 5.5702 LearningRate 0.0200 Epoch: 22 Global Step: 125640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:46,446-Speed 11626.86 samples/sec Loss 5.6882 LearningRate 0.0200 Epoch: 22 Global Step: 125650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:47,346-Speed 11379.19 samples/sec Loss 5.6645 LearningRate 0.0200 Epoch: 22 Global Step: 125660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:48,228-Speed 11616.68 samples/sec Loss 5.6541 LearningRate 0.0200 Epoch: 22 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:49,135-Speed 11305.35 samples/sec Loss 5.4997 LearningRate 0.0200 Epoch: 22 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:50,063-Speed 11041.36 samples/sec Loss 5.5134 LearningRate 0.0200 Epoch: 22 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:41:50,980-Speed 11173.76 samples/sec Loss 5.5114 LearningRate 0.0200 Epoch: 22 Global Step: 125700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:41:51,921-Speed 10888.94 samples/sec Loss 5.6479 LearningRate 0.0200 Epoch: 22 Global Step: 125710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:52,809-Speed 11534.29 samples/sec Loss 5.7286 LearningRate 0.0200 Epoch: 22 Global Step: 125720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:53,742-Speed 10986.85 samples/sec Loss 5.6278 LearningRate 0.0200 Epoch: 22 Global Step: 125730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:54,672-Speed 11020.80 samples/sec Loss 5.5034 LearningRate 0.0200 Epoch: 22 Global Step: 125740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:55,553-Speed 11634.48 samples/sec Loss 5.4389 LearningRate 0.0200 Epoch: 22 Global Step: 125750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:56,445-Speed 11484.71 samples/sec Loss 5.5863 LearningRate 0.0200 Epoch: 22 Global Step: 125760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:57,317-Speed 11748.46 samples/sec Loss 5.5489 LearningRate 0.0200 Epoch: 22 Global Step: 125770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:58,217-Speed 11383.12 samples/sec Loss 5.6089 LearningRate 0.0200 Epoch: 22 Global Step: 125780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:59,115-Speed 11411.80 samples/sec Loss 5.6847 LearningRate 0.0200 Epoch: 22 Global Step: 125790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:41:59,997-Speed 11626.56 samples/sec Loss 5.7399 LearningRate 0.0200 Epoch: 22 Global Step: 125800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:42:00,866-Speed 11786.97 samples/sec Loss 5.6920 LearningRate 0.0200 Epoch: 22 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:01,784-Speed 11167.38 samples/sec Loss 5.6063 LearningRate 0.0200 Epoch: 22 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:02,659-Speed 11719.44 samples/sec Loss 5.6220 LearningRate 0.0200 Epoch: 22 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:03,531-Speed 11741.38 samples/sec Loss 5.6220 LearningRate 0.0200 Epoch: 22 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:04,416-Speed 11588.68 samples/sec Loss 5.6495 LearningRate 0.0200 Epoch: 22 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:05,321-Speed 11310.66 samples/sec Loss 5.6198 LearningRate 0.0199 Epoch: 22 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:06,228-Speed 11304.37 samples/sec Loss 5.8315 LearningRate 0.0199 Epoch: 22 Global Step: 125870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:07,172-Speed 10857.54 samples/sec Loss 5.6691 LearningRate 0.0199 Epoch: 22 Global Step: 125880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:08,083-Speed 11246.23 samples/sec Loss 5.7114 LearningRate 0.0199 Epoch: 22 Global Step: 125890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:08,987-Speed 11334.36 samples/sec Loss 5.6594 LearningRate 0.0199 Epoch: 22 Global Step: 125900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:09,861-Speed 11726.53 samples/sec Loss 5.5719 LearningRate 0.0199 Epoch: 22 Global Step: 125910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:42:10,724-Speed 11871.90 samples/sec Loss 5.6795 LearningRate 0.0199 Epoch: 22 Global Step: 125920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:11,601-Speed 11684.65 samples/sec Loss 5.4985 LearningRate 0.0199 Epoch: 22 Global Step: 125930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:12,521-Speed 11136.38 samples/sec Loss 5.7790 LearningRate 0.0199 Epoch: 22 Global Step: 125940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:13,398-Speed 11702.06 samples/sec Loss 5.7741 LearningRate 0.0199 Epoch: 22 Global Step: 125950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:14,298-Speed 11379.41 samples/sec Loss 5.7402 LearningRate 0.0199 Epoch: 22 Global Step: 125960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:15,194-Speed 11436.13 samples/sec Loss 5.6915 LearningRate 0.0199 Epoch: 22 Global Step: 125970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:16,137-Speed 10873.54 samples/sec Loss 5.7010 LearningRate 0.0199 Epoch: 22 Global Step: 125980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:17,156-Speed 10055.62 samples/sec Loss 5.6648 LearningRate 0.0199 Epoch: 22 Global Step: 125990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:18,133-Speed 10484.66 samples/sec Loss 5.7602 LearningRate 0.0199 Epoch: 22 Global Step: 126000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:42:39,588-[lfw][126000]XNorm: 9.975141 Training: 2022-04-11 03:42:39,588-[lfw][126000]Accuracy-Flip: 0.99650+-0.00320 Training: 2022-04-11 03:42:39,589-[lfw][126000]Accuracy-Highest: 0.99650 Training: 2022-04-11 03:43:04,182-[cfp_fp][126000]XNorm: 8.460045 Training: 2022-04-11 03:43:04,183-[cfp_fp][126000]Accuracy-Flip: 0.94900+-0.01026 Training: 2022-04-11 03:43:04,183-[cfp_fp][126000]Accuracy-Highest: 0.94900 Training: 2022-04-11 03:43:25,420-[agedb_30][126000]XNorm: 9.801197 Training: 2022-04-11 03:43:25,421-[agedb_30][126000]Accuracy-Flip: 0.96300+-0.00552 Training: 2022-04-11 03:43:25,421-[agedb_30][126000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:43:26,334-Speed 150.15 samples/sec Loss 5.6790 LearningRate 0.0199 Epoch: 22 Global Step: 126010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:27,252-Speed 11163.08 samples/sec Loss 5.6903 LearningRate 0.0199 Epoch: 22 Global Step: 126020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:28,205-Speed 10749.14 samples/sec Loss 5.7488 LearningRate 0.0199 Epoch: 22 Global Step: 126030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:29,114-Speed 11275.33 samples/sec Loss 5.6354 LearningRate 0.0199 Epoch: 22 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:30,052-Speed 10925.87 samples/sec Loss 5.7530 LearningRate 0.0199 Epoch: 22 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:30,928-Speed 11695.73 samples/sec Loss 5.6836 LearningRate 0.0199 Epoch: 22 Global Step: 126060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:31,824-Speed 11430.53 samples/sec Loss 5.5823 LearningRate 0.0199 Epoch: 22 Global Step: 126070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:32,716-Speed 11496.91 samples/sec Loss 5.7512 LearningRate 0.0199 Epoch: 22 Global Step: 126080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:33,594-Speed 11677.80 samples/sec Loss 5.6758 LearningRate 0.0199 Epoch: 22 Global Step: 126090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:34,489-Speed 11445.68 samples/sec Loss 5.6649 LearningRate 0.0199 Epoch: 22 Global Step: 126100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:35,368-Speed 11666.68 samples/sec Loss 5.7478 LearningRate 0.0198 Epoch: 22 Global Step: 126110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:36,243-Speed 11698.48 samples/sec Loss 5.7391 LearningRate 0.0198 Epoch: 22 Global Step: 126120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:43:37,137-Speed 11472.46 samples/sec Loss 5.6901 LearningRate 0.0198 Epoch: 22 Global Step: 126130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:38,046-Speed 11268.93 samples/sec Loss 5.8488 LearningRate 0.0198 Epoch: 22 Global Step: 126140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:38,942-Speed 11437.18 samples/sec Loss 5.7048 LearningRate 0.0198 Epoch: 22 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:39,837-Speed 11442.86 samples/sec Loss 5.7312 LearningRate 0.0198 Epoch: 22 Global Step: 126160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:40,732-Speed 11460.58 samples/sec Loss 5.7762 LearningRate 0.0198 Epoch: 22 Global Step: 126170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:41,612-Speed 11651.12 samples/sec Loss 5.7713 LearningRate 0.0198 Epoch: 22 Global Step: 126180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:42,484-Speed 11745.91 samples/sec Loss 5.7546 LearningRate 0.0198 Epoch: 22 Global Step: 126190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:43,372-Speed 11536.67 samples/sec Loss 5.6206 LearningRate 0.0198 Epoch: 22 Global Step: 126200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:44,269-Speed 11423.22 samples/sec Loss 5.8104 LearningRate 0.0198 Epoch: 22 Global Step: 126210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:45,175-Speed 11311.71 samples/sec Loss 5.8155 LearningRate 0.0198 Epoch: 22 Global Step: 126220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:46,074-Speed 11410.12 samples/sec Loss 5.7123 LearningRate 0.0198 Epoch: 22 Global Step: 126230 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:43:46,989-Speed 11202.29 samples/sec Loss 5.8960 LearningRate 0.0198 Epoch: 22 Global Step: 126240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:47,928-Speed 10906.65 samples/sec Loss 5.7369 LearningRate 0.0198 Epoch: 22 Global Step: 126250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:48,849-Speed 11128.32 samples/sec Loss 5.6469 LearningRate 0.0198 Epoch: 22 Global Step: 126260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:49,723-Speed 11721.33 samples/sec Loss 5.7527 LearningRate 0.0198 Epoch: 22 Global Step: 126270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:50,617-Speed 11471.98 samples/sec Loss 5.7693 LearningRate 0.0198 Epoch: 22 Global Step: 126280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:51,601-Speed 10414.58 samples/sec Loss 5.6738 LearningRate 0.0198 Epoch: 22 Global Step: 126290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:52,577-Speed 10492.61 samples/sec Loss 5.6887 LearningRate 0.0198 Epoch: 22 Global Step: 126300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:53,502-Speed 11075.99 samples/sec Loss 5.9708 LearningRate 0.0198 Epoch: 22 Global Step: 126310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:54,411-Speed 11273.09 samples/sec Loss 5.8577 LearningRate 0.0198 Epoch: 22 Global Step: 126320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:55,321-Speed 11270.27 samples/sec Loss 5.8875 LearningRate 0.0198 Epoch: 22 Global Step: 126330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:56,228-Speed 11296.84 samples/sec Loss 5.7656 LearningRate 0.0198 Epoch: 22 Global Step: 126340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:57,114-Speed 11564.40 samples/sec Loss 5.8062 LearningRate 0.0198 Epoch: 22 Global Step: 126350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:43:58,000-Speed 11571.52 samples/sec Loss 5.7064 LearningRate 0.0198 Epoch: 22 Global Step: 126360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:58,886-Speed 11568.16 samples/sec Loss 5.7510 LearningRate 0.0197 Epoch: 22 Global Step: 126370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:43:59,813-Speed 11045.42 samples/sec Loss 5.8278 LearningRate 0.0197 Epoch: 22 Global Step: 126380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:00,697-Speed 11600.00 samples/sec Loss 5.9291 LearningRate 0.0197 Epoch: 22 Global Step: 126390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:01,637-Speed 10896.16 samples/sec Loss 5.8548 LearningRate 0.0197 Epoch: 22 Global Step: 126400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:02,572-Speed 10957.21 samples/sec Loss 5.8367 LearningRate 0.0197 Epoch: 22 Global Step: 126410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:03,480-Speed 11294.16 samples/sec Loss 5.8883 LearningRate 0.0197 Epoch: 22 Global Step: 126420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:04,387-Speed 11291.85 samples/sec Loss 5.7306 LearningRate 0.0197 Epoch: 22 Global Step: 126430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:05,340-Speed 10755.23 samples/sec Loss 5.7879 LearningRate 0.0197 Epoch: 22 Global Step: 126440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:06,284-Speed 10851.21 samples/sec Loss 5.6788 LearningRate 0.0197 Epoch: 22 Global Step: 126450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:07,212-Speed 11048.24 samples/sec Loss 5.6627 LearningRate 0.0197 Epoch: 22 Global Step: 126460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:08,116-Speed 11331.59 samples/sec Loss 5.7766 LearningRate 0.0197 Epoch: 22 Global Step: 126470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:08,991-Speed 11709.99 samples/sec Loss 5.8516 LearningRate 0.0197 Epoch: 22 Global Step: 126480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:09,874-Speed 11608.44 samples/sec Loss 5.7718 LearningRate 0.0197 Epoch: 22 Global Step: 126490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:10,773-Speed 11394.36 samples/sec Loss 5.8857 LearningRate 0.0197 Epoch: 22 Global Step: 126500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:11,679-Speed 11317.75 samples/sec Loss 5.7773 LearningRate 0.0197 Epoch: 22 Global Step: 126510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:12,558-Speed 11655.18 samples/sec Loss 5.7737 LearningRate 0.0197 Epoch: 22 Global Step: 126520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:13,444-Speed 11565.40 samples/sec Loss 5.8931 LearningRate 0.0197 Epoch: 22 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:14,341-Speed 11428.26 samples/sec Loss 5.7508 LearningRate 0.0197 Epoch: 22 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:15,265-Speed 11092.95 samples/sec Loss 5.7703 LearningRate 0.0197 Epoch: 22 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:16,208-Speed 10869.51 samples/sec Loss 5.6305 LearningRate 0.0197 Epoch: 22 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:17,120-Speed 11234.46 samples/sec Loss 5.8123 LearningRate 0.0197 Epoch: 22 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:18,046-Speed 11062.04 samples/sec Loss 5.8934 LearningRate 0.0197 Epoch: 22 Global Step: 126580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:18,918-Speed 11755.67 samples/sec Loss 5.6299 LearningRate 0.0197 Epoch: 22 Global Step: 126590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:19,826-Speed 11282.45 samples/sec Loss 5.8438 LearningRate 0.0197 Epoch: 22 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:20,706-Speed 11652.88 samples/sec Loss 5.8401 LearningRate 0.0197 Epoch: 22 Global Step: 126610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:21,571-Speed 11847.54 samples/sec Loss 5.7950 LearningRate 0.0196 Epoch: 22 Global Step: 126620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:22,459-Speed 11529.98 samples/sec Loss 5.7356 LearningRate 0.0196 Epoch: 22 Global Step: 126630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:23,365-Speed 11318.85 samples/sec Loss 5.7757 LearningRate 0.0196 Epoch: 22 Global Step: 126640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:24,238-Speed 11728.71 samples/sec Loss 5.7599 LearningRate 0.0196 Epoch: 22 Global Step: 126650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:25,129-Speed 11505.17 samples/sec Loss 5.7529 LearningRate 0.0196 Epoch: 22 Global Step: 126660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:26,013-Speed 11587.96 samples/sec Loss 5.8262 LearningRate 0.0196 Epoch: 22 Global Step: 126670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:26,889-Speed 11704.26 samples/sec Loss 5.9079 LearningRate 0.0196 Epoch: 22 Global Step: 126680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:27,783-Speed 11455.92 samples/sec Loss 5.7051 LearningRate 0.0196 Epoch: 22 Global Step: 126690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:28,689-Speed 11309.62 samples/sec Loss 5.7256 LearningRate 0.0196 Epoch: 22 Global Step: 126700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:29,643-Speed 10746.81 samples/sec Loss 5.9127 LearningRate 0.0196 Epoch: 22 Global Step: 126710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:30,543-Speed 11387.25 samples/sec Loss 5.7403 LearningRate 0.0196 Epoch: 22 Global Step: 126720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:31,428-Speed 11572.99 samples/sec Loss 5.8391 LearningRate 0.0196 Epoch: 22 Global Step: 126730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:32,394-Speed 10609.64 samples/sec Loss 5.8068 LearningRate 0.0196 Epoch: 22 Global Step: 126740 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:33,284-Speed 11514.66 samples/sec Loss 5.9128 LearningRate 0.0196 Epoch: 22 Global Step: 126750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:34,175-Speed 11504.04 samples/sec Loss 5.8518 LearningRate 0.0196 Epoch: 22 Global Step: 126760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:35,059-Speed 11589.29 samples/sec Loss 5.7675 LearningRate 0.0196 Epoch: 22 Global Step: 126770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:35,965-Speed 11316.47 samples/sec Loss 5.7254 LearningRate 0.0196 Epoch: 22 Global Step: 126780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:36,844-Speed 11651.63 samples/sec Loss 5.7770 LearningRate 0.0196 Epoch: 22 Global Step: 126790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:37,721-Speed 11682.49 samples/sec Loss 5.8613 LearningRate 0.0196 Epoch: 22 Global Step: 126800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:38,623-Speed 11368.80 samples/sec Loss 5.6984 LearningRate 0.0196 Epoch: 22 Global Step: 126810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:44:39,505-Speed 11611.93 samples/sec Loss 5.8031 LearningRate 0.0196 Epoch: 22 Global Step: 126820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:40,430-Speed 11089.41 samples/sec Loss 5.8585 LearningRate 0.0196 Epoch: 22 Global Step: 126830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:41,400-Speed 10561.98 samples/sec Loss 5.7986 LearningRate 0.0196 Epoch: 22 Global Step: 126840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:42,365-Speed 10613.37 samples/sec Loss 5.9687 LearningRate 0.0196 Epoch: 22 Global Step: 126850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:43,273-Speed 11294.51 samples/sec Loss 6.1011 LearningRate 0.0196 Epoch: 22 Global Step: 126860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:44,184-Speed 11241.03 samples/sec Loss 5.7213 LearningRate 0.0196 Epoch: 22 Global Step: 126870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:45,075-Speed 11506.23 samples/sec Loss 5.8811 LearningRate 0.0195 Epoch: 22 Global Step: 126880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:45,947-Speed 11749.30 samples/sec Loss 5.8403 LearningRate 0.0195 Epoch: 22 Global Step: 126890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:46,872-Speed 11072.95 samples/sec Loss 5.7775 LearningRate 0.0195 Epoch: 22 Global Step: 126900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:47,778-Speed 11316.60 samples/sec Loss 5.9173 LearningRate 0.0195 Epoch: 22 Global Step: 126910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:48,681-Speed 11345.70 samples/sec Loss 5.6829 LearningRate 0.0195 Epoch: 22 Global Step: 126920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:49,577-Speed 11436.52 samples/sec Loss 5.8633 LearningRate 0.0195 Epoch: 22 Global Step: 126930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:50,539-Speed 10653.05 samples/sec Loss 5.6982 LearningRate 0.0195 Epoch: 22 Global Step: 126940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:51,435-Speed 11433.20 samples/sec Loss 5.7970 LearningRate 0.0195 Epoch: 22 Global Step: 126950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:52,323-Speed 11544.27 samples/sec Loss 5.7356 LearningRate 0.0195 Epoch: 22 Global Step: 126960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:53,234-Speed 11246.26 samples/sec Loss 5.9539 LearningRate 0.0195 Epoch: 22 Global Step: 126970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:54,142-Speed 11279.75 samples/sec Loss 5.9478 LearningRate 0.0195 Epoch: 22 Global Step: 126980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:55,032-Speed 11520.54 samples/sec Loss 5.8046 LearningRate 0.0195 Epoch: 22 Global Step: 126990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:55,904-Speed 11749.11 samples/sec Loss 5.9012 LearningRate 0.0195 Epoch: 22 Global Step: 127000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:56,825-Speed 11125.48 samples/sec Loss 6.0110 LearningRate 0.0195 Epoch: 22 Global Step: 127010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:57,706-Speed 11640.32 samples/sec Loss 5.8662 LearningRate 0.0195 Epoch: 22 Global Step: 127020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:44:58,586-Speed 11643.19 samples/sec Loss 5.7926 LearningRate 0.0195 Epoch: 22 Global Step: 127030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:44:59,504-Speed 11190.15 samples/sec Loss 5.8219 LearningRate 0.0195 Epoch: 22 Global Step: 127040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:00,391-Speed 11545.96 samples/sec Loss 5.8716 LearningRate 0.0195 Epoch: 22 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:01,315-Speed 11090.12 samples/sec Loss 5.9410 LearningRate 0.0195 Epoch: 22 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:02,231-Speed 11194.82 samples/sec Loss 5.6722 LearningRate 0.0195 Epoch: 22 Global Step: 127070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:03,112-Speed 11623.90 samples/sec Loss 5.7719 LearningRate 0.0195 Epoch: 22 Global Step: 127080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:04,019-Speed 11300.95 samples/sec Loss 5.8726 LearningRate 0.0195 Epoch: 22 Global Step: 127090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:04,993-Speed 10518.67 samples/sec Loss 5.8313 LearningRate 0.0195 Epoch: 22 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:05,869-Speed 11691.68 samples/sec Loss 5.7226 LearningRate 0.0195 Epoch: 22 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:06,797-Speed 11048.89 samples/sec Loss 5.9897 LearningRate 0.0195 Epoch: 22 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:07,718-Speed 11129.24 samples/sec Loss 5.8963 LearningRate 0.0195 Epoch: 22 Global Step: 127130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:08,610-Speed 11481.00 samples/sec Loss 5.9463 LearningRate 0.0194 Epoch: 22 Global Step: 127140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:09,477-Speed 11833.92 samples/sec Loss 5.9138 LearningRate 0.0194 Epoch: 22 Global Step: 127150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:10,348-Speed 11780.19 samples/sec Loss 5.9425 LearningRate 0.0194 Epoch: 22 Global Step: 127160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:11,242-Speed 11463.26 samples/sec Loss 5.9328 LearningRate 0.0194 Epoch: 22 Global Step: 127170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:12,174-Speed 10989.54 samples/sec Loss 5.8179 LearningRate 0.0194 Epoch: 22 Global Step: 127180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:13,114-Speed 10901.20 samples/sec Loss 5.8988 LearningRate 0.0194 Epoch: 22 Global Step: 127190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:14,005-Speed 11500.98 samples/sec Loss 5.8197 LearningRate 0.0194 Epoch: 22 Global Step: 127200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:14,903-Speed 11417.57 samples/sec Loss 5.8947 LearningRate 0.0194 Epoch: 22 Global Step: 127210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:15,850-Speed 10814.66 samples/sec Loss 5.7791 LearningRate 0.0194 Epoch: 22 Global Step: 127220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:16,768-Speed 11166.44 samples/sec Loss 5.9036 LearningRate 0.0194 Epoch: 22 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:17,668-Speed 11387.95 samples/sec Loss 5.8946 LearningRate 0.0194 Epoch: 22 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:18,547-Speed 11657.83 samples/sec Loss 5.8368 LearningRate 0.0194 Epoch: 22 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:19,415-Speed 11794.66 samples/sec Loss 5.6934 LearningRate 0.0194 Epoch: 22 Global Step: 127260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:20,340-Speed 11080.20 samples/sec Loss 5.8393 LearningRate 0.0194 Epoch: 22 Global Step: 127270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:21,231-Speed 11500.30 samples/sec Loss 5.8534 LearningRate 0.0194 Epoch: 22 Global Step: 127280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:22,117-Speed 11569.16 samples/sec Loss 5.9070 LearningRate 0.0194 Epoch: 22 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:23,014-Speed 11427.16 samples/sec Loss 5.8610 LearningRate 0.0194 Epoch: 22 Global Step: 127300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:23,904-Speed 11510.42 samples/sec Loss 5.8343 LearningRate 0.0194 Epoch: 22 Global Step: 127310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:24,809-Speed 11331.87 samples/sec Loss 5.7883 LearningRate 0.0194 Epoch: 22 Global Step: 127320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:25,707-Speed 11409.64 samples/sec Loss 5.8718 LearningRate 0.0194 Epoch: 22 Global Step: 127330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:26,638-Speed 11010.26 samples/sec Loss 5.8631 LearningRate 0.0194 Epoch: 22 Global Step: 127340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:27,537-Speed 11393.41 samples/sec Loss 5.9946 LearningRate 0.0194 Epoch: 22 Global Step: 127350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:28,434-Speed 11426.05 samples/sec Loss 6.0463 LearningRate 0.0194 Epoch: 22 Global Step: 127360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:29,370-Speed 10945.79 samples/sec Loss 5.8477 LearningRate 0.0194 Epoch: 22 Global Step: 127370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:30,278-Speed 11288.96 samples/sec Loss 5.8376 LearningRate 0.0194 Epoch: 22 Global Step: 127380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:31,188-Speed 11258.25 samples/sec Loss 5.7887 LearningRate 0.0194 Epoch: 22 Global Step: 127390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:32,092-Speed 11334.51 samples/sec Loss 5.8196 LearningRate 0.0193 Epoch: 22 Global Step: 127400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:32,973-Speed 11621.53 samples/sec Loss 5.8320 LearningRate 0.0193 Epoch: 22 Global Step: 127410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:33,854-Speed 11632.57 samples/sec Loss 5.8720 LearningRate 0.0193 Epoch: 22 Global Step: 127420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:34,737-Speed 11615.65 samples/sec Loss 5.9170 LearningRate 0.0193 Epoch: 22 Global Step: 127430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:35,607-Speed 11781.47 samples/sec Loss 5.8077 LearningRate 0.0193 Epoch: 22 Global Step: 127440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:36,491-Speed 11582.62 samples/sec Loss 5.8992 LearningRate 0.0193 Epoch: 22 Global Step: 127450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:37,375-Speed 11589.45 samples/sec Loss 5.9107 LearningRate 0.0193 Epoch: 22 Global Step: 127460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:38,260-Speed 11583.14 samples/sec Loss 5.8378 LearningRate 0.0193 Epoch: 22 Global Step: 127470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:39,217-Speed 10707.23 samples/sec Loss 5.9734 LearningRate 0.0193 Epoch: 22 Global Step: 127480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:40,105-Speed 11533.67 samples/sec Loss 5.8609 LearningRate 0.0193 Epoch: 22 Global Step: 127490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:40,989-Speed 11592.62 samples/sec Loss 5.9702 LearningRate 0.0193 Epoch: 22 Global Step: 127500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:41,875-Speed 11564.50 samples/sec Loss 5.8594 LearningRate 0.0193 Epoch: 22 Global Step: 127510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:42,770-Speed 11459.31 samples/sec Loss 5.9589 LearningRate 0.0193 Epoch: 22 Global Step: 127520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:43,712-Speed 10872.68 samples/sec Loss 5.8478 LearningRate 0.0193 Epoch: 22 Global Step: 127530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:44,626-Speed 11216.01 samples/sec Loss 5.8687 LearningRate 0.0193 Epoch: 22 Global Step: 127540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:45,526-Speed 11388.96 samples/sec Loss 5.8414 LearningRate 0.0193 Epoch: 22 Global Step: 127550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:46,410-Speed 11584.64 samples/sec Loss 6.0103 LearningRate 0.0193 Epoch: 22 Global Step: 127560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:47,283-Speed 11741.53 samples/sec Loss 6.0318 LearningRate 0.0193 Epoch: 22 Global Step: 127570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:48,174-Speed 11502.22 samples/sec Loss 5.9594 LearningRate 0.0193 Epoch: 22 Global Step: 127580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:49,049-Speed 11714.20 samples/sec Loss 5.9467 LearningRate 0.0193 Epoch: 22 Global Step: 127590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:50,008-Speed 10685.85 samples/sec Loss 5.8600 LearningRate 0.0193 Epoch: 22 Global Step: 127600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:50,923-Speed 11195.92 samples/sec Loss 5.8966 LearningRate 0.0193 Epoch: 22 Global Step: 127610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:51,845-Speed 11114.84 samples/sec Loss 5.9348 LearningRate 0.0193 Epoch: 22 Global Step: 127620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:52,767-Speed 11110.30 samples/sec Loss 5.9780 LearningRate 0.0193 Epoch: 22 Global Step: 127630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:53,673-Speed 11315.28 samples/sec Loss 5.8201 LearningRate 0.0193 Epoch: 22 Global Step: 127640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:54,582-Speed 11266.70 samples/sec Loss 5.8922 LearningRate 0.0193 Epoch: 22 Global Step: 127650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:45:55,458-Speed 11701.67 samples/sec Loss 5.9398 LearningRate 0.0192 Epoch: 22 Global Step: 127660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:56,382-Speed 11098.29 samples/sec Loss 5.9740 LearningRate 0.0192 Epoch: 22 Global Step: 127670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:57,261-Speed 11658.77 samples/sec Loss 5.8996 LearningRate 0.0192 Epoch: 22 Global Step: 127680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:58,162-Speed 11367.01 samples/sec Loss 5.9612 LearningRate 0.0192 Epoch: 22 Global Step: 127690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:59,045-Speed 11609.84 samples/sec Loss 5.9076 LearningRate 0.0192 Epoch: 22 Global Step: 127700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:45:59,944-Speed 11403.53 samples/sec Loss 5.9206 LearningRate 0.0192 Epoch: 22 Global Step: 127710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:00,829-Speed 11575.28 samples/sec Loss 5.8629 LearningRate 0.0192 Epoch: 22 Global Step: 127720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:01,733-Speed 11337.93 samples/sec Loss 6.0010 LearningRate 0.0192 Epoch: 22 Global Step: 127730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:02,624-Speed 11497.52 samples/sec Loss 6.0601 LearningRate 0.0192 Epoch: 22 Global Step: 127740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:03,536-Speed 11236.51 samples/sec Loss 5.8600 LearningRate 0.0192 Epoch: 22 Global Step: 127750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:04,463-Speed 11052.36 samples/sec Loss 5.9401 LearningRate 0.0192 Epoch: 22 Global Step: 127760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:46:05,407-Speed 10862.71 samples/sec Loss 5.8876 LearningRate 0.0192 Epoch: 22 Global Step: 127770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:46:06,283-Speed 11690.75 samples/sec Loss 5.8299 LearningRate 0.0192 Epoch: 22 Global Step: 127780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:07,179-Speed 11436.08 samples/sec Loss 5.9611 LearningRate 0.0192 Epoch: 22 Global Step: 127790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:08,065-Speed 11572.03 samples/sec Loss 5.8877 LearningRate 0.0192 Epoch: 22 Global Step: 127800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:08,941-Speed 11700.96 samples/sec Loss 6.1171 LearningRate 0.0192 Epoch: 22 Global Step: 127810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:09,816-Speed 11708.87 samples/sec Loss 5.9067 LearningRate 0.0192 Epoch: 22 Global Step: 127820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:10,692-Speed 11700.49 samples/sec Loss 6.1123 LearningRate 0.0192 Epoch: 22 Global Step: 127830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:11,573-Speed 11629.01 samples/sec Loss 5.9306 LearningRate 0.0192 Epoch: 22 Global Step: 127840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:12,472-Speed 11398.54 samples/sec Loss 6.0342 LearningRate 0.0192 Epoch: 22 Global Step: 127850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:13,384-Speed 11232.44 samples/sec Loss 5.8705 LearningRate 0.0192 Epoch: 22 Global Step: 127860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:14,269-Speed 11586.14 samples/sec Loss 5.9601 LearningRate 0.0192 Epoch: 22 Global Step: 127870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:15,206-Speed 10937.33 samples/sec Loss 5.9198 LearningRate 0.0192 Epoch: 22 Global Step: 127880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:46:16,085-Speed 11661.41 samples/sec Loss 5.9932 LearningRate 0.0192 Epoch: 22 Global Step: 127890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:16,999-Speed 11211.54 samples/sec Loss 5.8585 LearningRate 0.0192 Epoch: 22 Global Step: 127900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:17,907-Speed 11283.81 samples/sec Loss 5.9944 LearningRate 0.0192 Epoch: 22 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:18,837-Speed 11016.41 samples/sec Loss 5.8853 LearningRate 0.0191 Epoch: 22 Global Step: 127920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:19,719-Speed 11617.52 samples/sec Loss 5.9332 LearningRate 0.0191 Epoch: 22 Global Step: 127930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:20,606-Speed 11553.52 samples/sec Loss 5.9096 LearningRate 0.0191 Epoch: 22 Global Step: 127940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:21,504-Speed 11413.67 samples/sec Loss 5.8700 LearningRate 0.0191 Epoch: 22 Global Step: 127950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:22,405-Speed 11374.16 samples/sec Loss 5.9822 LearningRate 0.0191 Epoch: 22 Global Step: 127960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:23,313-Speed 11276.22 samples/sec Loss 5.7780 LearningRate 0.0191 Epoch: 22 Global Step: 127970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:24,204-Speed 11496.34 samples/sec Loss 5.9104 LearningRate 0.0191 Epoch: 22 Global Step: 127980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:25,141-Speed 10939.65 samples/sec Loss 5.8178 LearningRate 0.0191 Epoch: 22 Global Step: 127990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:26,028-Speed 11558.36 samples/sec Loss 5.9459 LearningRate 0.0191 Epoch: 22 Global Step: 128000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:46:47,488-[lfw][128000]XNorm: 9.808068 Training: 2022-04-11 03:46:47,488-[lfw][128000]Accuracy-Flip: 0.99583+-0.00291 Training: 2022-04-11 03:46:47,489-[lfw][128000]Accuracy-Highest: 0.99650 Training: 2022-04-11 03:47:12,248-[cfp_fp][128000]XNorm: 8.234114 Training: 2022-04-11 03:47:12,248-[cfp_fp][128000]Accuracy-Flip: 0.94657+-0.01129 Training: 2022-04-11 03:47:12,249-[cfp_fp][128000]Accuracy-Highest: 0.94900 Training: 2022-04-11 03:47:33,558-[agedb_30][128000]XNorm: 9.586265 Training: 2022-04-11 03:47:33,559-[agedb_30][128000]Accuracy-Flip: 0.96167+-0.00931 Training: 2022-04-11 03:47:33,559-[agedb_30][128000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:47:34,457-Speed 149.65 samples/sec Loss 5.8820 LearningRate 0.0191 Epoch: 22 Global Step: 128010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:35,361-Speed 11331.50 samples/sec Loss 5.9546 LearningRate 0.0191 Epoch: 22 Global Step: 128020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:36,237-Speed 11704.75 samples/sec Loss 5.8485 LearningRate 0.0191 Epoch: 22 Global Step: 128030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:37,134-Speed 11411.86 samples/sec Loss 5.9325 LearningRate 0.0191 Epoch: 22 Global Step: 128040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:38,016-Speed 11626.80 samples/sec Loss 5.9705 LearningRate 0.0191 Epoch: 22 Global Step: 128050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:38,934-Speed 11160.71 samples/sec Loss 5.9914 LearningRate 0.0191 Epoch: 22 Global Step: 128060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:39,823-Speed 11527.54 samples/sec Loss 5.8421 LearningRate 0.0191 Epoch: 22 Global Step: 128070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:40,729-Speed 11309.00 samples/sec Loss 5.8383 LearningRate 0.0191 Epoch: 22 Global Step: 128080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:41,664-Speed 10960.05 samples/sec Loss 5.9826 LearningRate 0.0191 Epoch: 22 Global Step: 128090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 03:47:42,537-Speed 11744.50 samples/sec Loss 5.8642 LearningRate 0.0191 Epoch: 22 Global Step: 128100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:43,445-Speed 11283.41 samples/sec Loss 6.0577 LearningRate 0.0191 Epoch: 22 Global Step: 128110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:44,357-Speed 11235.47 samples/sec Loss 6.0333 LearningRate 0.0191 Epoch: 22 Global Step: 128120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:45,233-Speed 11697.04 samples/sec Loss 5.9839 LearningRate 0.0191 Epoch: 22 Global Step: 128130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:46,124-Speed 11503.14 samples/sec Loss 5.9299 LearningRate 0.0191 Epoch: 22 Global Step: 128140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:47,016-Speed 11492.30 samples/sec Loss 5.9134 LearningRate 0.0191 Epoch: 22 Global Step: 128150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:47,918-Speed 11353.06 samples/sec Loss 5.8662 LearningRate 0.0191 Epoch: 22 Global Step: 128160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:48,864-Speed 10831.24 samples/sec Loss 5.9832 LearningRate 0.0191 Epoch: 22 Global Step: 128170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:49,786-Speed 11113.71 samples/sec Loss 5.8586 LearningRate 0.0190 Epoch: 22 Global Step: 128180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:50,685-Speed 11406.28 samples/sec Loss 5.8693 LearningRate 0.0190 Epoch: 22 Global Step: 128190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:51,594-Speed 11268.61 samples/sec Loss 5.9745 LearningRate 0.0190 Epoch: 22 Global Step: 128200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:52,495-Speed 11376.61 samples/sec Loss 5.9506 LearningRate 0.0190 Epoch: 22 Global Step: 128210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:53,402-Speed 11289.27 samples/sec Loss 5.9531 LearningRate 0.0190 Epoch: 22 Global Step: 128220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:54,282-Speed 11645.79 samples/sec Loss 6.0610 LearningRate 0.0190 Epoch: 22 Global Step: 128230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:55,210-Speed 11041.28 samples/sec Loss 6.0070 LearningRate 0.0190 Epoch: 22 Global Step: 128240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:56,151-Speed 10897.97 samples/sec Loss 5.9803 LearningRate 0.0190 Epoch: 22 Global Step: 128250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:57,047-Speed 11428.10 samples/sec Loss 5.9833 LearningRate 0.0190 Epoch: 22 Global Step: 128260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:57,932-Speed 11582.03 samples/sec Loss 5.8593 LearningRate 0.0190 Epoch: 22 Global Step: 128270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:47:58,853-Speed 11125.15 samples/sec Loss 6.0193 LearningRate 0.0190 Epoch: 22 Global Step: 128280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:47:59,729-Speed 11700.49 samples/sec Loss 5.7939 LearningRate 0.0190 Epoch: 22 Global Step: 128290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:00,596-Speed 11815.83 samples/sec Loss 5.8608 LearningRate 0.0190 Epoch: 22 Global Step: 128300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:01,488-Speed 11489.94 samples/sec Loss 5.9095 LearningRate 0.0190 Epoch: 22 Global Step: 128310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:02,368-Speed 11649.10 samples/sec Loss 5.8158 LearningRate 0.0190 Epoch: 22 Global Step: 128320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:03,261-Speed 11474.05 samples/sec Loss 5.9473 LearningRate 0.0190 Epoch: 22 Global Step: 128330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:04,160-Speed 11400.78 samples/sec Loss 6.0177 LearningRate 0.0190 Epoch: 22 Global Step: 128340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:05,070-Speed 11250.03 samples/sec Loss 6.0149 LearningRate 0.0190 Epoch: 22 Global Step: 128350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:05,966-Speed 11440.97 samples/sec Loss 5.8676 LearningRate 0.0190 Epoch: 22 Global Step: 128360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:06,881-Speed 11202.02 samples/sec Loss 5.9462 LearningRate 0.0190 Epoch: 22 Global Step: 128370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:07,800-Speed 11150.59 samples/sec Loss 5.8800 LearningRate 0.0190 Epoch: 22 Global Step: 128380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:08,702-Speed 11362.38 samples/sec Loss 5.8818 LearningRate 0.0190 Epoch: 22 Global Step: 128390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:09,584-Speed 11617.75 samples/sec Loss 5.9420 LearningRate 0.0190 Epoch: 22 Global Step: 128400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:10,468-Speed 11587.49 samples/sec Loss 5.9919 LearningRate 0.0190 Epoch: 22 Global Step: 128410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:11,361-Speed 11479.79 samples/sec Loss 5.9904 LearningRate 0.0190 Epoch: 22 Global Step: 128420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:12,251-Speed 11505.63 samples/sec Loss 5.9676 LearningRate 0.0190 Epoch: 22 Global Step: 128430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:13,155-Speed 11344.53 samples/sec Loss 6.0225 LearningRate 0.0189 Epoch: 22 Global Step: 128440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:14,078-Speed 11095.91 samples/sec Loss 6.0016 LearningRate 0.0189 Epoch: 22 Global Step: 128450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:14,971-Speed 11472.67 samples/sec Loss 6.0221 LearningRate 0.0189 Epoch: 22 Global Step: 128460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:15,887-Speed 11195.22 samples/sec Loss 5.9299 LearningRate 0.0189 Epoch: 22 Global Step: 128470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:16,811-Speed 11093.02 samples/sec Loss 5.6819 LearningRate 0.0189 Epoch: 22 Global Step: 128480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:17,721-Speed 11261.15 samples/sec Loss 5.8985 LearningRate 0.0189 Epoch: 22 Global Step: 128490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:18,662-Speed 10888.43 samples/sec Loss 5.9752 LearningRate 0.0189 Epoch: 22 Global Step: 128500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:19,585-Speed 11098.30 samples/sec Loss 5.9937 LearningRate 0.0189 Epoch: 22 Global Step: 128510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:20,521-Speed 10947.89 samples/sec Loss 5.9154 LearningRate 0.0189 Epoch: 22 Global Step: 128520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:21,406-Speed 11574.41 samples/sec Loss 5.9498 LearningRate 0.0189 Epoch: 22 Global Step: 128530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:22,276-Speed 11787.91 samples/sec Loss 5.9101 LearningRate 0.0189 Epoch: 22 Global Step: 128540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:23,154-Speed 11668.44 samples/sec Loss 5.8755 LearningRate 0.0189 Epoch: 22 Global Step: 128550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 03:48:24,058-Speed 11339.65 samples/sec Loss 5.9091 LearningRate 0.0189 Epoch: 22 Global Step: 128560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:48:24,949-Speed 11502.27 samples/sec Loss 5.8100 LearningRate 0.0189 Epoch: 22 Global Step: 128570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:48:25,831-Speed 11616.29 samples/sec Loss 6.0021 LearningRate 0.0189 Epoch: 22 Global Step: 128580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 03:48:26,741-Speed 11256.89 samples/sec Loss 5.9877 LearningRate 0.0189 Epoch: 22 Global Step: 128590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:27,642-Speed 11374.61 samples/sec Loss 5.9425 LearningRate 0.0189 Epoch: 22 Global Step: 128600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:28,616-Speed 10517.31 samples/sec Loss 5.8189 LearningRate 0.0189 Epoch: 22 Global Step: 128610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:29,501-Speed 11585.31 samples/sec Loss 5.9096 LearningRate 0.0189 Epoch: 22 Global Step: 128620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:30,432-Speed 11007.79 samples/sec Loss 5.9210 LearningRate 0.0189 Epoch: 22 Global Step: 128630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:31,397-Speed 10616.34 samples/sec Loss 5.8509 LearningRate 0.0189 Epoch: 22 Global Step: 128640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:32,323-Speed 11065.45 samples/sec Loss 5.8583 LearningRate 0.0189 Epoch: 22 Global Step: 128650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:33,211-Speed 11545.66 samples/sec Loss 6.0003 LearningRate 0.0189 Epoch: 22 Global Step: 128660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:34,091-Speed 11636.57 samples/sec Loss 5.8389 LearningRate 0.0189 Epoch: 22 Global Step: 128670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:34,992-Speed 11379.20 samples/sec Loss 5.9974 LearningRate 0.0189 Epoch: 22 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:35,909-Speed 11170.50 samples/sec Loss 5.7566 LearningRate 0.0189 Epoch: 22 Global Step: 128690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:36,802-Speed 11480.27 samples/sec Loss 6.0022 LearningRate 0.0188 Epoch: 22 Global Step: 128700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:37,677-Speed 11713.86 samples/sec Loss 5.9200 LearningRate 0.0188 Epoch: 22 Global Step: 128710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:38,575-Speed 11403.94 samples/sec Loss 5.9025 LearningRate 0.0188 Epoch: 22 Global Step: 128720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:39,450-Speed 11718.90 samples/sec Loss 5.8781 LearningRate 0.0188 Epoch: 22 Global Step: 128730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:40,348-Speed 11405.06 samples/sec Loss 5.7582 LearningRate 0.0188 Epoch: 22 Global Step: 128740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:41,242-Speed 11469.92 samples/sec Loss 6.0305 LearningRate 0.0188 Epoch: 22 Global Step: 128750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:42,159-Speed 11167.89 samples/sec Loss 6.0264 LearningRate 0.0188 Epoch: 22 Global Step: 128760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:48:43,033-Speed 11734.92 samples/sec Loss 6.0606 LearningRate 0.0188 Epoch: 22 Global Step: 128770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:43,913-Speed 11635.53 samples/sec Loss 6.0619 LearningRate 0.0188 Epoch: 22 Global Step: 128780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:44,834-Speed 11129.59 samples/sec Loss 5.8237 LearningRate 0.0188 Epoch: 22 Global Step: 128790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:45,729-Speed 11456.52 samples/sec Loss 6.0202 LearningRate 0.0188 Epoch: 22 Global Step: 128800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:46,626-Speed 11423.41 samples/sec Loss 5.8038 LearningRate 0.0188 Epoch: 22 Global Step: 128810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:47,518-Speed 11485.08 samples/sec Loss 5.9729 LearningRate 0.0188 Epoch: 22 Global Step: 128820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:48,393-Speed 11715.46 samples/sec Loss 5.8462 LearningRate 0.0188 Epoch: 22 Global Step: 128830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:49,287-Speed 11453.48 samples/sec Loss 6.0522 LearningRate 0.0188 Epoch: 22 Global Step: 128840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:50,186-Speed 11401.56 samples/sec Loss 5.9245 LearningRate 0.0188 Epoch: 22 Global Step: 128850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:51,078-Speed 11486.46 samples/sec Loss 6.0411 LearningRate 0.0188 Epoch: 22 Global Step: 128860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:51,975-Speed 11423.26 samples/sec Loss 5.8386 LearningRate 0.0188 Epoch: 22 Global Step: 128870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:48:52,863-Speed 11539.78 samples/sec Loss 5.9901 LearningRate 0.0188 Epoch: 22 Global Step: 128880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:53,775-Speed 11242.34 samples/sec Loss 5.8115 LearningRate 0.0188 Epoch: 22 Global Step: 128890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:54,679-Speed 11336.57 samples/sec Loss 5.8101 LearningRate 0.0188 Epoch: 22 Global Step: 128900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:55,612-Speed 10981.45 samples/sec Loss 5.9794 LearningRate 0.0188 Epoch: 22 Global Step: 128910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:56,496-Speed 11586.65 samples/sec Loss 5.8473 LearningRate 0.0188 Epoch: 22 Global Step: 128920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:57,379-Speed 11606.54 samples/sec Loss 5.9763 LearningRate 0.0188 Epoch: 22 Global Step: 128930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:58,268-Speed 11525.79 samples/sec Loss 5.9309 LearningRate 0.0188 Epoch: 22 Global Step: 128940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:48:59,167-Speed 11400.30 samples/sec Loss 5.8098 LearningRate 0.0188 Epoch: 22 Global Step: 128950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:00,108-Speed 10897.97 samples/sec Loss 6.0338 LearningRate 0.0187 Epoch: 22 Global Step: 128960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:01,028-Speed 11139.84 samples/sec Loss 5.9110 LearningRate 0.0187 Epoch: 22 Global Step: 128970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:01,931-Speed 11341.66 samples/sec Loss 6.0190 LearningRate 0.0187 Epoch: 22 Global Step: 128980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:49:02,822-Speed 11508.92 samples/sec Loss 5.9656 LearningRate 0.0187 Epoch: 22 Global Step: 128990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:03,784-Speed 10652.25 samples/sec Loss 5.9069 LearningRate 0.0187 Epoch: 22 Global Step: 129000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:04,740-Speed 10721.69 samples/sec Loss 6.0438 LearningRate 0.0187 Epoch: 22 Global Step: 129010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:05,687-Speed 10815.82 samples/sec Loss 5.9292 LearningRate 0.0187 Epoch: 22 Global Step: 129020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:06,635-Speed 10806.27 samples/sec Loss 5.7742 LearningRate 0.0187 Epoch: 22 Global Step: 129030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:07,567-Speed 10994.71 samples/sec Loss 5.8676 LearningRate 0.0187 Epoch: 22 Global Step: 129040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:08,467-Speed 11388.96 samples/sec Loss 5.9399 LearningRate 0.0187 Epoch: 22 Global Step: 129050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:09,346-Speed 11658.25 samples/sec Loss 5.9913 LearningRate 0.0187 Epoch: 22 Global Step: 129060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:10,226-Speed 11641.90 samples/sec Loss 5.7750 LearningRate 0.0187 Epoch: 22 Global Step: 129070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:11,127-Speed 11375.00 samples/sec Loss 5.9842 LearningRate 0.0187 Epoch: 22 Global Step: 129080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:12,025-Speed 11407.96 samples/sec Loss 5.9241 LearningRate 0.0187 Epoch: 22 Global Step: 129090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:12,968-Speed 10869.96 samples/sec Loss 5.9026 LearningRate 0.0187 Epoch: 22 Global Step: 129100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:13,854-Speed 11565.89 samples/sec Loss 5.9102 LearningRate 0.0187 Epoch: 22 Global Step: 129110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:14,758-Speed 11341.85 samples/sec Loss 5.9550 LearningRate 0.0187 Epoch: 22 Global Step: 129120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:15,664-Speed 11305.41 samples/sec Loss 5.9295 LearningRate 0.0187 Epoch: 22 Global Step: 129130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:16,571-Speed 11297.05 samples/sec Loss 5.8830 LearningRate 0.0187 Epoch: 22 Global Step: 129140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:17,498-Speed 11051.05 samples/sec Loss 6.0487 LearningRate 0.0187 Epoch: 22 Global Step: 129150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:18,402-Speed 11335.06 samples/sec Loss 6.0886 LearningRate 0.0187 Epoch: 22 Global Step: 129160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:49:19,292-Speed 11515.71 samples/sec Loss 5.9801 LearningRate 0.0187 Epoch: 22 Global Step: 129170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:20,160-Speed 11812.51 samples/sec Loss 5.9370 LearningRate 0.0187 Epoch: 22 Global Step: 129180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:21,049-Speed 11527.30 samples/sec Loss 5.9353 LearningRate 0.0187 Epoch: 22 Global Step: 129190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:21,960-Speed 11248.22 samples/sec Loss 5.9481 LearningRate 0.0187 Epoch: 22 Global Step: 129200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:22,841-Speed 11636.81 samples/sec Loss 6.0913 LearningRate 0.0187 Epoch: 22 Global Step: 129210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:23,715-Speed 11720.40 samples/sec Loss 5.9734 LearningRate 0.0186 Epoch: 22 Global Step: 129220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:24,612-Speed 11420.91 samples/sec Loss 5.8884 LearningRate 0.0186 Epoch: 22 Global Step: 129230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:25,547-Speed 10961.64 samples/sec Loss 5.9419 LearningRate 0.0186 Epoch: 22 Global Step: 129240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:26,494-Speed 10827.25 samples/sec Loss 5.8338 LearningRate 0.0186 Epoch: 22 Global Step: 129250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:27,405-Speed 11249.14 samples/sec Loss 5.8522 LearningRate 0.0186 Epoch: 22 Global Step: 129260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:28,319-Speed 11210.36 samples/sec Loss 5.9325 LearningRate 0.0186 Epoch: 22 Global Step: 129270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:49:29,227-Speed 11281.10 samples/sec Loss 5.8620 LearningRate 0.0186 Epoch: 22 Global Step: 129280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:30,114-Speed 11550.18 samples/sec Loss 5.9355 LearningRate 0.0186 Epoch: 22 Global Step: 129290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:31,010-Speed 11442.67 samples/sec Loss 6.0084 LearningRate 0.0186 Epoch: 22 Global Step: 129300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:31,913-Speed 11349.73 samples/sec Loss 6.0275 LearningRate 0.0186 Epoch: 22 Global Step: 129310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:32,797-Speed 11591.41 samples/sec Loss 5.9788 LearningRate 0.0186 Epoch: 22 Global Step: 129320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:33,705-Speed 11278.60 samples/sec Loss 6.0302 LearningRate 0.0186 Epoch: 22 Global Step: 129330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:34,587-Speed 11622.76 samples/sec Loss 5.8687 LearningRate 0.0186 Epoch: 22 Global Step: 129340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:35,474-Speed 11553.33 samples/sec Loss 5.9976 LearningRate 0.0186 Epoch: 22 Global Step: 129350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:36,388-Speed 11217.68 samples/sec Loss 6.0484 LearningRate 0.0186 Epoch: 22 Global Step: 129360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:37,304-Speed 11184.55 samples/sec Loss 5.9179 LearningRate 0.0186 Epoch: 22 Global Step: 129370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:38,182-Speed 11672.28 samples/sec Loss 5.9109 LearningRate 0.0186 Epoch: 22 Global Step: 129380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:49:39,083-Speed 11371.33 samples/sec Loss 5.9222 LearningRate 0.0186 Epoch: 22 Global Step: 129390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:49:39,952-Speed 11803.85 samples/sec Loss 5.7968 LearningRate 0.0186 Epoch: 22 Global Step: 129400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:40,843-Speed 11510.87 samples/sec Loss 5.8953 LearningRate 0.0186 Epoch: 22 Global Step: 129410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:41,714-Speed 11766.10 samples/sec Loss 6.1569 LearningRate 0.0186 Epoch: 22 Global Step: 129420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:42,614-Speed 11384.35 samples/sec Loss 5.8864 LearningRate 0.0186 Epoch: 22 Global Step: 129430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:43,516-Speed 11369.35 samples/sec Loss 5.7907 LearningRate 0.0186 Epoch: 22 Global Step: 129440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:44,416-Speed 11381.16 samples/sec Loss 6.0268 LearningRate 0.0186 Epoch: 22 Global Step: 129450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:45,307-Speed 11502.02 samples/sec Loss 5.7861 LearningRate 0.0186 Epoch: 22 Global Step: 129460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:46,187-Speed 11647.04 samples/sec Loss 5.8824 LearningRate 0.0186 Epoch: 22 Global Step: 129470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:47,078-Speed 11492.07 samples/sec Loss 5.8662 LearningRate 0.0186 Epoch: 22 Global Step: 129480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:47,976-Speed 11420.83 samples/sec Loss 5.9446 LearningRate 0.0185 Epoch: 22 Global Step: 129490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:48,866-Speed 11514.97 samples/sec Loss 5.9710 LearningRate 0.0185 Epoch: 22 Global Step: 129500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:49:49,734-Speed 11807.91 samples/sec Loss 5.8907 LearningRate 0.0185 Epoch: 22 Global Step: 129510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:50,622-Speed 11542.74 samples/sec Loss 6.0248 LearningRate 0.0185 Epoch: 22 Global Step: 129520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:51,557-Speed 10956.48 samples/sec Loss 5.9431 LearningRate 0.0185 Epoch: 22 Global Step: 129530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:52,441-Speed 11590.48 samples/sec Loss 6.0093 LearningRate 0.0185 Epoch: 22 Global Step: 129540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:53,367-Speed 11076.11 samples/sec Loss 5.9466 LearningRate 0.0185 Epoch: 22 Global Step: 129550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:54,261-Speed 11459.11 samples/sec Loss 6.0928 LearningRate 0.0185 Epoch: 22 Global Step: 129560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:55,150-Speed 11524.16 samples/sec Loss 5.9968 LearningRate 0.0185 Epoch: 22 Global Step: 129570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:56,055-Speed 11326.46 samples/sec Loss 5.9486 LearningRate 0.0185 Epoch: 22 Global Step: 129580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:56,971-Speed 11182.78 samples/sec Loss 5.9564 LearningRate 0.0185 Epoch: 22 Global Step: 129590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:57,861-Speed 11526.98 samples/sec Loss 6.0098 LearningRate 0.0185 Epoch: 22 Global Step: 129600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:58,746-Speed 11571.67 samples/sec Loss 5.9665 LearningRate 0.0185 Epoch: 22 Global Step: 129610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:49:59,625-Speed 11663.52 samples/sec Loss 6.0060 LearningRate 0.0185 Epoch: 22 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:00,499-Speed 11714.75 samples/sec Loss 5.8710 LearningRate 0.0185 Epoch: 22 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:01,403-Speed 11341.20 samples/sec Loss 5.9306 LearningRate 0.0185 Epoch: 22 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:02,312-Speed 11271.79 samples/sec Loss 5.7590 LearningRate 0.0185 Epoch: 22 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:03,226-Speed 11223.48 samples/sec Loss 5.8602 LearningRate 0.0185 Epoch: 22 Global Step: 129660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:04,108-Speed 11611.14 samples/sec Loss 5.8541 LearningRate 0.0185 Epoch: 22 Global Step: 129670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:05,013-Speed 11328.52 samples/sec Loss 5.9899 LearningRate 0.0185 Epoch: 22 Global Step: 129680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:05,965-Speed 10766.22 samples/sec Loss 5.8516 LearningRate 0.0185 Epoch: 22 Global Step: 129690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:06,876-Speed 11241.93 samples/sec Loss 5.8745 LearningRate 0.0185 Epoch: 22 Global Step: 129700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:07,751-Speed 11707.12 samples/sec Loss 6.0010 LearningRate 0.0185 Epoch: 22 Global Step: 129710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:08,655-Speed 11341.71 samples/sec Loss 5.8453 LearningRate 0.0185 Epoch: 22 Global Step: 129720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:09,525-Speed 11773.41 samples/sec Loss 5.8293 LearningRate 0.0185 Epoch: 22 Global Step: 129730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:10,400-Speed 11711.98 samples/sec Loss 5.8920 LearningRate 0.0185 Epoch: 22 Global Step: 129740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:11,272-Speed 11760.28 samples/sec Loss 5.9157 LearningRate 0.0184 Epoch: 22 Global Step: 129750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:12,148-Speed 11703.59 samples/sec Loss 5.9966 LearningRate 0.0184 Epoch: 22 Global Step: 129760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:13,071-Speed 11104.94 samples/sec Loss 5.9122 LearningRate 0.0184 Epoch: 22 Global Step: 129770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:13,960-Speed 11537.22 samples/sec Loss 5.8740 LearningRate 0.0184 Epoch: 22 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:14,915-Speed 10727.44 samples/sec Loss 6.0533 LearningRate 0.0184 Epoch: 22 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:15,823-Speed 11285.54 samples/sec Loss 5.9855 LearningRate 0.0184 Epoch: 22 Global Step: 129800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:16,740-Speed 11170.18 samples/sec Loss 5.9051 LearningRate 0.0184 Epoch: 22 Global Step: 129810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:17,624-Speed 11594.18 samples/sec Loss 5.9278 LearningRate 0.0184 Epoch: 22 Global Step: 129820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:18,544-Speed 11143.90 samples/sec Loss 5.9104 LearningRate 0.0184 Epoch: 22 Global Step: 129830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:19,438-Speed 11466.80 samples/sec Loss 6.0104 LearningRate 0.0184 Epoch: 22 Global Step: 129840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:20,306-Speed 11805.96 samples/sec Loss 5.9607 LearningRate 0.0184 Epoch: 22 Global Step: 129850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:21,180-Speed 11716.72 samples/sec Loss 5.8799 LearningRate 0.0184 Epoch: 22 Global Step: 129860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:22,065-Speed 11585.59 samples/sec Loss 6.0486 LearningRate 0.0184 Epoch: 22 Global Step: 129870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:50:22,934-Speed 11794.46 samples/sec Loss 5.9989 LearningRate 0.0184 Epoch: 22 Global Step: 129880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:23,803-Speed 11786.87 samples/sec Loss 6.0458 LearningRate 0.0184 Epoch: 22 Global Step: 129890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:24,700-Speed 11419.56 samples/sec Loss 6.1009 LearningRate 0.0184 Epoch: 22 Global Step: 129900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:50:25,625-Speed 11077.34 samples/sec Loss 5.8649 LearningRate 0.0184 Epoch: 22 Global Step: 129910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:26,513-Speed 11539.32 samples/sec Loss 5.9005 LearningRate 0.0184 Epoch: 22 Global Step: 129920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:27,433-Speed 11141.25 samples/sec Loss 5.9654 LearningRate 0.0184 Epoch: 22 Global Step: 129930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:28,307-Speed 11727.16 samples/sec Loss 5.9065 LearningRate 0.0184 Epoch: 22 Global Step: 129940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:29,215-Speed 11281.98 samples/sec Loss 6.1542 LearningRate 0.0184 Epoch: 22 Global Step: 129950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:30,091-Speed 11702.03 samples/sec Loss 6.0196 LearningRate 0.0184 Epoch: 22 Global Step: 129960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:30,973-Speed 11609.27 samples/sec Loss 5.8221 LearningRate 0.0184 Epoch: 22 Global Step: 129970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:31,887-Speed 11216.51 samples/sec Loss 5.9139 LearningRate 0.0184 Epoch: 22 Global Step: 129980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:32,797-Speed 11257.75 samples/sec Loss 5.7990 LearningRate 0.0184 Epoch: 22 Global Step: 129990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:33,675-Speed 11672.57 samples/sec Loss 5.8708 LearningRate 0.0184 Epoch: 22 Global Step: 130000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:50:54,973-[lfw][130000]XNorm: 9.775756 Training: 2022-04-11 03:50:54,973-[lfw][130000]Accuracy-Flip: 0.99550+-0.00350 Training: 2022-04-11 03:50:54,974-[lfw][130000]Accuracy-Highest: 0.99650 Training: 2022-04-11 03:51:19,651-[cfp_fp][130000]XNorm: 8.221540 Training: 2022-04-11 03:51:19,652-[cfp_fp][130000]Accuracy-Flip: 0.94429+-0.01188 Training: 2022-04-11 03:51:19,652-[cfp_fp][130000]Accuracy-Highest: 0.94900 Training: 2022-04-11 03:51:40,976-[agedb_30][130000]XNorm: 9.540069 Training: 2022-04-11 03:51:40,976-[agedb_30][130000]Accuracy-Flip: 0.96500+-0.00527 Training: 2022-04-11 03:51:40,977-[agedb_30][130000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:51:41,867-Speed 150.17 samples/sec Loss 6.0793 LearningRate 0.0184 Epoch: 22 Global Step: 130010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:42,751-Speed 11581.93 samples/sec Loss 5.9598 LearningRate 0.0183 Epoch: 22 Global Step: 130020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:43,632-Speed 11633.40 samples/sec Loss 5.9568 LearningRate 0.0183 Epoch: 22 Global Step: 130030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:44,534-Speed 11365.79 samples/sec Loss 5.9089 LearningRate 0.0183 Epoch: 22 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:45,426-Speed 11478.01 samples/sec Loss 5.9516 LearningRate 0.0183 Epoch: 22 Global Step: 130050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:46,311-Speed 11586.49 samples/sec Loss 5.9028 LearningRate 0.0183 Epoch: 22 Global Step: 130060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:47,209-Speed 11409.26 samples/sec Loss 5.9037 LearningRate 0.0183 Epoch: 22 Global Step: 130070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:48,106-Speed 11430.34 samples/sec Loss 6.0002 LearningRate 0.0183 Epoch: 22 Global Step: 130080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:49,006-Speed 11378.72 samples/sec Loss 5.9209 LearningRate 0.0183 Epoch: 22 Global Step: 130090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:49,885-Speed 11662.65 samples/sec Loss 5.7466 LearningRate 0.0183 Epoch: 22 Global Step: 130100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:50,786-Speed 11376.83 samples/sec Loss 6.1463 LearningRate 0.0183 Epoch: 22 Global Step: 130110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:51:51,709-Speed 11102.48 samples/sec Loss 5.8919 LearningRate 0.0183 Epoch: 22 Global Step: 130120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:52,647-Speed 10919.61 samples/sec Loss 5.9396 LearningRate 0.0183 Epoch: 22 Global Step: 130130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:53,578-Speed 11017.92 samples/sec Loss 5.9302 LearningRate 0.0183 Epoch: 22 Global Step: 130140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:54,493-Speed 11190.00 samples/sec Loss 5.9663 LearningRate 0.0183 Epoch: 22 Global Step: 130150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:55,383-Speed 11515.73 samples/sec Loss 5.8822 LearningRate 0.0183 Epoch: 22 Global Step: 130160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:56,294-Speed 11256.11 samples/sec Loss 6.0349 LearningRate 0.0183 Epoch: 22 Global Step: 130170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:57,195-Speed 11362.53 samples/sec Loss 5.9261 LearningRate 0.0183 Epoch: 22 Global Step: 130180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:58,074-Speed 11667.70 samples/sec Loss 5.9806 LearningRate 0.0183 Epoch: 22 Global Step: 130190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:58,953-Speed 11651.49 samples/sec Loss 5.9265 LearningRate 0.0183 Epoch: 22 Global Step: 130200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:51:59,831-Speed 11669.54 samples/sec Loss 6.0533 LearningRate 0.0183 Epoch: 22 Global Step: 130210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:00,732-Speed 11381.73 samples/sec Loss 5.9825 LearningRate 0.0183 Epoch: 22 Global Step: 130220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:52:01,623-Speed 11493.50 samples/sec Loss 5.9115 LearningRate 0.0183 Epoch: 22 Global Step: 130230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:02,525-Speed 11362.36 samples/sec Loss 5.9639 LearningRate 0.0183 Epoch: 22 Global Step: 130240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:03,440-Speed 11203.18 samples/sec Loss 5.9622 LearningRate 0.0183 Epoch: 22 Global Step: 130250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:04,326-Speed 11558.17 samples/sec Loss 5.9048 LearningRate 0.0183 Epoch: 22 Global Step: 130260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:05,238-Speed 11234.98 samples/sec Loss 6.0351 LearningRate 0.0183 Epoch: 22 Global Step: 130270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:06,159-Speed 11129.77 samples/sec Loss 5.9895 LearningRate 0.0182 Epoch: 22 Global Step: 130280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:07,143-Speed 10410.22 samples/sec Loss 5.9957 LearningRate 0.0182 Epoch: 22 Global Step: 130290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:08,043-Speed 11391.48 samples/sec Loss 5.9866 LearningRate 0.0182 Epoch: 22 Global Step: 130300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:08,959-Speed 11185.89 samples/sec Loss 6.0718 LearningRate 0.0182 Epoch: 22 Global Step: 130310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:09,868-Speed 11269.79 samples/sec Loss 6.0723 LearningRate 0.0182 Epoch: 22 Global Step: 130320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:10,747-Speed 11654.69 samples/sec Loss 5.8459 LearningRate 0.0182 Epoch: 22 Global Step: 130330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:52:11,624-Speed 11692.39 samples/sec Loss 6.0354 LearningRate 0.0182 Epoch: 22 Global Step: 130340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:52:12,516-Speed 11491.20 samples/sec Loss 6.0241 LearningRate 0.0182 Epoch: 22 Global Step: 130350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:13,450-Speed 10969.21 samples/sec Loss 5.9483 LearningRate 0.0182 Epoch: 22 Global Step: 130360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:14,392-Speed 10878.88 samples/sec Loss 5.9293 LearningRate 0.0182 Epoch: 22 Global Step: 130370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:15,349-Speed 10705.32 samples/sec Loss 5.9597 LearningRate 0.0182 Epoch: 22 Global Step: 130380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:16,280-Speed 11012.27 samples/sec Loss 6.0459 LearningRate 0.0182 Epoch: 22 Global Step: 130390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:17,177-Speed 11421.67 samples/sec Loss 5.9752 LearningRate 0.0182 Epoch: 22 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:18,087-Speed 11258.87 samples/sec Loss 5.9226 LearningRate 0.0182 Epoch: 22 Global Step: 130410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:19,032-Speed 10843.41 samples/sec Loss 5.9205 LearningRate 0.0182 Epoch: 22 Global Step: 130420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:19,928-Speed 11434.85 samples/sec Loss 5.9773 LearningRate 0.0182 Epoch: 22 Global Step: 130430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:20,823-Speed 11452.22 samples/sec Loss 5.9673 LearningRate 0.0182 Epoch: 22 Global Step: 130440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:21,718-Speed 11440.37 samples/sec Loss 6.0616 LearningRate 0.0182 Epoch: 22 Global Step: 130450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:52:22,597-Speed 11659.55 samples/sec Loss 5.9981 LearningRate 0.0182 Epoch: 22 Global Step: 130460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:52:23,488-Speed 11508.72 samples/sec Loss 5.8611 LearningRate 0.0182 Epoch: 22 Global Step: 130470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:52:24,366-Speed 11669.75 samples/sec Loss 5.9334 LearningRate 0.0182 Epoch: 22 Global Step: 130480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:25,269-Speed 11348.07 samples/sec Loss 5.9349 LearningRate 0.0182 Epoch: 22 Global Step: 130490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:26,246-Speed 10492.46 samples/sec Loss 5.9265 LearningRate 0.0182 Epoch: 22 Global Step: 130500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:27,161-Speed 11195.31 samples/sec Loss 6.0059 LearningRate 0.0182 Epoch: 22 Global Step: 130510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:28,038-Speed 11685.44 samples/sec Loss 6.0252 LearningRate 0.0182 Epoch: 22 Global Step: 130520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:28,941-Speed 11347.35 samples/sec Loss 5.9225 LearningRate 0.0182 Epoch: 22 Global Step: 130530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:29,874-Speed 10981.13 samples/sec Loss 5.8300 LearningRate 0.0182 Epoch: 22 Global Step: 130540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:30,773-Speed 11401.85 samples/sec Loss 5.9397 LearningRate 0.0181 Epoch: 22 Global Step: 130550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:31,668-Speed 11446.70 samples/sec Loss 5.8820 LearningRate 0.0181 Epoch: 22 Global Step: 130560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:32,547-Speed 11661.77 samples/sec Loss 5.9680 LearningRate 0.0181 Epoch: 22 Global Step: 130570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:33,429-Speed 11620.35 samples/sec Loss 5.8123 LearningRate 0.0181 Epoch: 22 Global Step: 130580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:34,309-Speed 11642.08 samples/sec Loss 5.8723 LearningRate 0.0181 Epoch: 22 Global Step: 130590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:35,195-Speed 11560.97 samples/sec Loss 5.9164 LearningRate 0.0181 Epoch: 22 Global Step: 130600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:36,078-Speed 11610.56 samples/sec Loss 5.9605 LearningRate 0.0181 Epoch: 22 Global Step: 130610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:36,983-Speed 11324.39 samples/sec Loss 5.8524 LearningRate 0.0181 Epoch: 22 Global Step: 130620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:37,915-Speed 10986.10 samples/sec Loss 5.9134 LearningRate 0.0181 Epoch: 22 Global Step: 130630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:38,885-Speed 10567.14 samples/sec Loss 5.9097 LearningRate 0.0181 Epoch: 22 Global Step: 130640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:39,842-Speed 10706.26 samples/sec Loss 5.9153 LearningRate 0.0181 Epoch: 22 Global Step: 130650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:40,736-Speed 11461.14 samples/sec Loss 5.9329 LearningRate 0.0181 Epoch: 22 Global Step: 130660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:41,639-Speed 11345.16 samples/sec Loss 5.9790 LearningRate 0.0181 Epoch: 22 Global Step: 130670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:52:42,557-Speed 11172.82 samples/sec Loss 6.1219 LearningRate 0.0181 Epoch: 22 Global Step: 130680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:43,485-Speed 11036.94 samples/sec Loss 5.8626 LearningRate 0.0181 Epoch: 22 Global Step: 130690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:44,393-Speed 11286.82 samples/sec Loss 6.0412 LearningRate 0.0181 Epoch: 22 Global Step: 130700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:45,298-Speed 11319.23 samples/sec Loss 6.0554 LearningRate 0.0181 Epoch: 22 Global Step: 130710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:46,188-Speed 11513.79 samples/sec Loss 5.8270 LearningRate 0.0181 Epoch: 22 Global Step: 130720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:47,126-Speed 10929.74 samples/sec Loss 6.0209 LearningRate 0.0181 Epoch: 22 Global Step: 130730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:48,022-Speed 11437.72 samples/sec Loss 5.9092 LearningRate 0.0181 Epoch: 22 Global Step: 130740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:48,953-Speed 11005.42 samples/sec Loss 6.0035 LearningRate 0.0181 Epoch: 22 Global Step: 130750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:49,875-Speed 11107.50 samples/sec Loss 5.9634 LearningRate 0.0181 Epoch: 22 Global Step: 130760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:52:50,876-Speed 10237.22 samples/sec Loss 5.8836 LearningRate 0.0181 Epoch: 22 Global Step: 130770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:02,066-Speed 915.24 samples/sec Loss 5.9288 LearningRate 0.0181 Epoch: 23 Global Step: 130780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:53:03,016-Speed 10791.59 samples/sec Loss 5.3009 LearningRate 0.0181 Epoch: 23 Global Step: 130790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:03,974-Speed 10692.82 samples/sec Loss 5.3375 LearningRate 0.0181 Epoch: 23 Global Step: 130800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:04,968-Speed 10313.20 samples/sec Loss 5.2372 LearningRate 0.0181 Epoch: 23 Global Step: 130810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:05,910-Speed 10871.10 samples/sec Loss 5.2424 LearningRate 0.0180 Epoch: 23 Global Step: 130820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:06,812-Speed 11371.14 samples/sec Loss 5.3086 LearningRate 0.0180 Epoch: 23 Global Step: 130830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:07,763-Speed 10769.39 samples/sec Loss 5.3093 LearningRate 0.0180 Epoch: 23 Global Step: 130840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:08,695-Speed 10989.78 samples/sec Loss 5.3646 LearningRate 0.0180 Epoch: 23 Global Step: 130850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:09,609-Speed 11216.55 samples/sec Loss 5.2678 LearningRate 0.0180 Epoch: 23 Global Step: 130860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:10,517-Speed 11285.90 samples/sec Loss 5.1497 LearningRate 0.0180 Epoch: 23 Global Step: 130870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:11,418-Speed 11374.12 samples/sec Loss 5.2983 LearningRate 0.0180 Epoch: 23 Global Step: 130880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:12,316-Speed 11410.84 samples/sec Loss 5.3182 LearningRate 0.0180 Epoch: 23 Global Step: 130890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:53:13,241-Speed 11078.37 samples/sec Loss 5.3774 LearningRate 0.0180 Epoch: 23 Global Step: 130900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:14,329-Speed 9412.60 samples/sec Loss 5.3164 LearningRate 0.0180 Epoch: 23 Global Step: 130910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:15,242-Speed 11223.92 samples/sec Loss 5.3050 LearningRate 0.0180 Epoch: 23 Global Step: 130920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:16,174-Speed 10999.37 samples/sec Loss 5.4182 LearningRate 0.0180 Epoch: 23 Global Step: 130930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:17,143-Speed 10577.96 samples/sec Loss 5.3664 LearningRate 0.0180 Epoch: 23 Global Step: 130940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:18,074-Speed 11002.99 samples/sec Loss 5.2583 LearningRate 0.0180 Epoch: 23 Global Step: 130950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:18,996-Speed 11117.89 samples/sec Loss 5.3870 LearningRate 0.0180 Epoch: 23 Global Step: 130960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:19,979-Speed 10419.58 samples/sec Loss 5.3207 LearningRate 0.0180 Epoch: 23 Global Step: 130970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:20,882-Speed 11349.75 samples/sec Loss 5.4114 LearningRate 0.0180 Epoch: 23 Global Step: 130980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:21,778-Speed 11432.15 samples/sec Loss 5.3870 LearningRate 0.0180 Epoch: 23 Global Step: 130990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:22,650-Speed 11752.23 samples/sec Loss 5.3404 LearningRate 0.0180 Epoch: 23 Global Step: 131000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:23,544-Speed 11473.41 samples/sec Loss 5.2426 LearningRate 0.0180 Epoch: 23 Global Step: 131010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:24,448-Speed 11325.03 samples/sec Loss 5.3988 LearningRate 0.0180 Epoch: 23 Global Step: 131020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:25,396-Speed 10810.39 samples/sec Loss 5.3061 LearningRate 0.0180 Epoch: 23 Global Step: 131030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:26,308-Speed 11233.19 samples/sec Loss 5.4624 LearningRate 0.0180 Epoch: 23 Global Step: 131040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:27,265-Speed 10711.68 samples/sec Loss 5.3381 LearningRate 0.0180 Epoch: 23 Global Step: 131050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:28,207-Speed 10878.66 samples/sec Loss 5.3649 LearningRate 0.0180 Epoch: 23 Global Step: 131060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:29,127-Speed 11144.59 samples/sec Loss 5.4428 LearningRate 0.0180 Epoch: 23 Global Step: 131070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:30,043-Speed 11180.80 samples/sec Loss 5.3576 LearningRate 0.0179 Epoch: 23 Global Step: 131080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:30,927-Speed 11594.21 samples/sec Loss 5.4719 LearningRate 0.0179 Epoch: 23 Global Step: 131090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:31,847-Speed 11131.35 samples/sec Loss 5.3763 LearningRate 0.0179 Epoch: 23 Global Step: 131100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:53:32,754-Speed 11299.45 samples/sec Loss 5.3321 LearningRate 0.0179 Epoch: 23 Global Step: 131110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:53:33,680-Speed 11069.77 samples/sec Loss 5.4612 LearningRate 0.0179 Epoch: 23 Global Step: 131120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:53:34,581-Speed 11372.82 samples/sec Loss 5.5052 LearningRate 0.0179 Epoch: 23 Global Step: 131130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:53:35,484-Speed 11344.66 samples/sec Loss 5.4393 LearningRate 0.0179 Epoch: 23 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:36,421-Speed 10936.57 samples/sec Loss 5.4555 LearningRate 0.0179 Epoch: 23 Global Step: 131150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:37,320-Speed 11407.78 samples/sec Loss 5.4124 LearningRate 0.0179 Epoch: 23 Global Step: 131160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:38,279-Speed 10677.97 samples/sec Loss 5.5074 LearningRate 0.0179 Epoch: 23 Global Step: 131170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:39,204-Speed 11081.04 samples/sec Loss 5.3668 LearningRate 0.0179 Epoch: 23 Global Step: 131180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:40,115-Speed 11254.53 samples/sec Loss 5.3510 LearningRate 0.0179 Epoch: 23 Global Step: 131190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:41,036-Speed 11120.05 samples/sec Loss 5.4174 LearningRate 0.0179 Epoch: 23 Global Step: 131200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:41,956-Speed 11142.64 samples/sec Loss 5.4672 LearningRate 0.0179 Epoch: 23 Global Step: 131210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:42,878-Speed 11107.82 samples/sec Loss 5.4337 LearningRate 0.0179 Epoch: 23 Global Step: 131220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:43,846-Speed 10592.02 samples/sec Loss 5.4063 LearningRate 0.0179 Epoch: 23 Global Step: 131230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:44,774-Speed 11035.18 samples/sec Loss 5.3179 LearningRate 0.0179 Epoch: 23 Global Step: 131240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:45,671-Speed 11421.77 samples/sec Loss 5.3977 LearningRate 0.0179 Epoch: 23 Global Step: 131250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:46,566-Speed 11450.93 samples/sec Loss 5.4096 LearningRate 0.0179 Epoch: 23 Global Step: 131260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:47,493-Speed 11062.27 samples/sec Loss 5.3691 LearningRate 0.0179 Epoch: 23 Global Step: 131270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:48,414-Speed 11124.90 samples/sec Loss 5.4044 LearningRate 0.0179 Epoch: 23 Global Step: 131280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:49,356-Speed 10869.88 samples/sec Loss 5.5579 LearningRate 0.0179 Epoch: 23 Global Step: 131290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:53:50,320-Speed 10633.38 samples/sec Loss 5.4025 LearningRate 0.0179 Epoch: 23 Global Step: 131300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:51,221-Speed 11378.52 samples/sec Loss 5.5874 LearningRate 0.0179 Epoch: 23 Global Step: 131310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:52,127-Speed 11306.63 samples/sec Loss 5.5283 LearningRate 0.0179 Epoch: 23 Global Step: 131320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:53,029-Speed 11364.10 samples/sec Loss 5.4288 LearningRate 0.0179 Epoch: 23 Global Step: 131330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:53,989-Speed 10665.85 samples/sec Loss 5.4169 LearningRate 0.0179 Epoch: 23 Global Step: 131340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:54,982-Speed 10317.96 samples/sec Loss 5.3622 LearningRate 0.0178 Epoch: 23 Global Step: 131350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:55,919-Speed 10943.91 samples/sec Loss 5.4987 LearningRate 0.0178 Epoch: 23 Global Step: 131360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:56,815-Speed 11429.54 samples/sec Loss 5.3981 LearningRate 0.0178 Epoch: 23 Global Step: 131370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:57,703-Speed 11563.06 samples/sec Loss 5.4151 LearningRate 0.0178 Epoch: 23 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:58,610-Speed 11329.20 samples/sec Loss 5.5871 LearningRate 0.0178 Epoch: 23 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:53:59,504-Speed 11466.24 samples/sec Loss 5.5251 LearningRate 0.0178 Epoch: 23 Global Step: 131400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:54:00,402-Speed 11408.78 samples/sec Loss 5.6256 LearningRate 0.0178 Epoch: 23 Global Step: 131410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:01,311-Speed 11266.89 samples/sec Loss 5.3522 LearningRate 0.0178 Epoch: 23 Global Step: 131420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:02,204-Speed 11482.32 samples/sec Loss 5.4170 LearningRate 0.0178 Epoch: 23 Global Step: 131430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:03,124-Speed 11135.87 samples/sec Loss 5.3414 LearningRate 0.0178 Epoch: 23 Global Step: 131440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:04,029-Speed 11338.10 samples/sec Loss 5.3427 LearningRate 0.0178 Epoch: 23 Global Step: 131450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:04,958-Speed 11033.15 samples/sec Loss 5.5596 LearningRate 0.0178 Epoch: 23 Global Step: 131460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:05,859-Speed 11378.37 samples/sec Loss 5.5675 LearningRate 0.0178 Epoch: 23 Global Step: 131470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:06,760-Speed 11363.90 samples/sec Loss 5.4912 LearningRate 0.0178 Epoch: 23 Global Step: 131480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:07,671-Speed 11252.45 samples/sec Loss 5.6375 LearningRate 0.0178 Epoch: 23 Global Step: 131490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:08,616-Speed 10843.10 samples/sec Loss 5.6089 LearningRate 0.0178 Epoch: 23 Global Step: 131500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:09,511-Speed 11448.44 samples/sec Loss 5.4775 LearningRate 0.0178 Epoch: 23 Global Step: 131510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:10,428-Speed 11171.54 samples/sec Loss 5.6641 LearningRate 0.0178 Epoch: 23 Global Step: 131520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:11,310-Speed 11618.91 samples/sec Loss 5.5800 LearningRate 0.0178 Epoch: 23 Global Step: 131530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:12,232-Speed 11117.66 samples/sec Loss 5.6328 LearningRate 0.0178 Epoch: 23 Global Step: 131540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:13,174-Speed 10881.73 samples/sec Loss 5.4212 LearningRate 0.0178 Epoch: 23 Global Step: 131550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:14,085-Speed 11241.98 samples/sec Loss 5.3779 LearningRate 0.0178 Epoch: 23 Global Step: 131560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:15,051-Speed 10608.93 samples/sec Loss 5.5445 LearningRate 0.0178 Epoch: 23 Global Step: 131570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:16,005-Speed 10740.36 samples/sec Loss 5.6280 LearningRate 0.0178 Epoch: 23 Global Step: 131580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:16,930-Speed 11078.35 samples/sec Loss 5.4343 LearningRate 0.0178 Epoch: 23 Global Step: 131590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:17,857-Speed 11055.60 samples/sec Loss 5.6022 LearningRate 0.0178 Epoch: 23 Global Step: 131600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:18,807-Speed 10783.92 samples/sec Loss 5.5182 LearningRate 0.0178 Epoch: 23 Global Step: 131610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:19,737-Speed 11031.01 samples/sec Loss 5.5472 LearningRate 0.0177 Epoch: 23 Global Step: 131620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:20,679-Speed 10873.67 samples/sec Loss 5.5889 LearningRate 0.0177 Epoch: 23 Global Step: 131630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:21,569-Speed 11513.04 samples/sec Loss 5.3966 LearningRate 0.0177 Epoch: 23 Global Step: 131640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:22,461-Speed 11484.66 samples/sec Loss 5.5910 LearningRate 0.0177 Epoch: 23 Global Step: 131650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:23,356-Speed 11460.57 samples/sec Loss 5.6650 LearningRate 0.0177 Epoch: 23 Global Step: 131660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:24,274-Speed 11163.27 samples/sec Loss 5.3939 LearningRate 0.0177 Epoch: 23 Global Step: 131670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:25,215-Speed 10887.47 samples/sec Loss 5.4956 LearningRate 0.0177 Epoch: 23 Global Step: 131680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:26,166-Speed 10774.81 samples/sec Loss 5.5387 LearningRate 0.0177 Epoch: 23 Global Step: 131690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:27,101-Speed 10957.10 samples/sec Loss 5.5061 LearningRate 0.0177 Epoch: 23 Global Step: 131700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:28,016-Speed 11200.57 samples/sec Loss 5.6061 LearningRate 0.0177 Epoch: 23 Global Step: 131710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:28,912-Speed 11430.48 samples/sec Loss 5.5885 LearningRate 0.0177 Epoch: 23 Global Step: 131720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:29,803-Speed 11502.57 samples/sec Loss 5.5608 LearningRate 0.0177 Epoch: 23 Global Step: 131730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:30,740-Speed 10933.69 samples/sec Loss 5.4533 LearningRate 0.0177 Epoch: 23 Global Step: 131740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:31,637-Speed 11422.16 samples/sec Loss 5.4646 LearningRate 0.0177 Epoch: 23 Global Step: 131750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:32,542-Speed 11330.16 samples/sec Loss 5.5272 LearningRate 0.0177 Epoch: 23 Global Step: 131760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:33,427-Speed 11580.98 samples/sec Loss 5.6924 LearningRate 0.0177 Epoch: 23 Global Step: 131770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:34,327-Speed 11381.58 samples/sec Loss 5.6239 LearningRate 0.0177 Epoch: 23 Global Step: 131780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:35,232-Speed 11319.88 samples/sec Loss 5.4659 LearningRate 0.0177 Epoch: 23 Global Step: 131790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:36,135-Speed 11344.50 samples/sec Loss 5.4580 LearningRate 0.0177 Epoch: 23 Global Step: 131800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:37,036-Speed 11374.80 samples/sec Loss 5.6152 LearningRate 0.0177 Epoch: 23 Global Step: 131810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:37,961-Speed 11074.40 samples/sec Loss 5.3992 LearningRate 0.0177 Epoch: 23 Global Step: 131820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:38,892-Speed 11013.62 samples/sec Loss 5.7305 LearningRate 0.0177 Epoch: 23 Global Step: 131830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:39,844-Speed 10757.92 samples/sec Loss 5.6469 LearningRate 0.0177 Epoch: 23 Global Step: 131840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:40,750-Speed 11318.83 samples/sec Loss 5.5799 LearningRate 0.0177 Epoch: 23 Global Step: 131850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:41,667-Speed 11170.73 samples/sec Loss 5.7133 LearningRate 0.0177 Epoch: 23 Global Step: 131860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:42,623-Speed 10719.97 samples/sec Loss 5.5710 LearningRate 0.0177 Epoch: 23 Global Step: 131870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:43,541-Speed 11164.09 samples/sec Loss 5.5343 LearningRate 0.0177 Epoch: 23 Global Step: 131880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:44,478-Speed 10940.90 samples/sec Loss 5.6675 LearningRate 0.0176 Epoch: 23 Global Step: 131890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:45,360-Speed 11618.81 samples/sec Loss 5.6281 LearningRate 0.0176 Epoch: 23 Global Step: 131900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:46,263-Speed 11347.20 samples/sec Loss 5.6709 LearningRate 0.0176 Epoch: 23 Global Step: 131910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:47,159-Speed 11433.99 samples/sec Loss 5.7365 LearningRate 0.0176 Epoch: 23 Global Step: 131920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:48,054-Speed 11444.47 samples/sec Loss 5.5499 LearningRate 0.0176 Epoch: 23 Global Step: 131930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:54:48,971-Speed 11188.88 samples/sec Loss 5.6084 LearningRate 0.0176 Epoch: 23 Global Step: 131940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:49,909-Speed 10920.79 samples/sec Loss 5.6791 LearningRate 0.0176 Epoch: 23 Global Step: 131950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:50,856-Speed 10816.07 samples/sec Loss 5.5038 LearningRate 0.0176 Epoch: 23 Global Step: 131960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:51,759-Speed 11348.29 samples/sec Loss 5.5490 LearningRate 0.0176 Epoch: 23 Global Step: 131970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:52,676-Speed 11178.92 samples/sec Loss 5.6270 LearningRate 0.0176 Epoch: 23 Global Step: 131980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:53,555-Speed 11656.43 samples/sec Loss 5.5997 LearningRate 0.0176 Epoch: 23 Global Step: 131990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:54:54,447-Speed 11483.45 samples/sec Loss 5.6565 LearningRate 0.0176 Epoch: 23 Global Step: 132000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:55:15,834-[lfw][132000]XNorm: 9.625231 Training: 2022-04-11 03:55:15,835-[lfw][132000]Accuracy-Flip: 0.99600+-0.00291 Training: 2022-04-11 03:55:15,835-[lfw][132000]Accuracy-Highest: 0.99650 Training: 2022-04-11 03:55:40,507-[cfp_fp][132000]XNorm: 8.170966 Training: 2022-04-11 03:55:40,508-[cfp_fp][132000]Accuracy-Flip: 0.95171+-0.01055 Training: 2022-04-11 03:55:40,508-[cfp_fp][132000]Accuracy-Highest: 0.95171 Training: 2022-04-11 03:56:01,826-[agedb_30][132000]XNorm: 9.431796 Training: 2022-04-11 03:56:01,827-[agedb_30][132000]Accuracy-Flip: 0.96550+-0.00578 Training: 2022-04-11 03:56:01,827-[agedb_30][132000]Accuracy-Highest: 0.96783 Training: 2022-04-11 03:56:02,726-Speed 149.98 samples/sec Loss 5.6517 LearningRate 0.0176 Epoch: 23 Global Step: 132010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:03,645-Speed 11152.33 samples/sec Loss 5.5664 LearningRate 0.0176 Epoch: 23 Global Step: 132020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:04,550-Speed 11320.58 samples/sec Loss 5.5604 LearningRate 0.0176 Epoch: 23 Global Step: 132030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:05,454-Speed 11336.37 samples/sec Loss 5.5443 LearningRate 0.0176 Epoch: 23 Global Step: 132040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:06,403-Speed 10790.37 samples/sec Loss 5.5992 LearningRate 0.0176 Epoch: 23 Global Step: 132050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:07,303-Speed 11384.82 samples/sec Loss 5.6252 LearningRate 0.0176 Epoch: 23 Global Step: 132060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:08,223-Speed 11135.91 samples/sec Loss 5.6159 LearningRate 0.0176 Epoch: 23 Global Step: 132070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:09,108-Speed 11576.12 samples/sec Loss 5.5630 LearningRate 0.0176 Epoch: 23 Global Step: 132080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:10,010-Speed 11369.06 samples/sec Loss 5.6329 LearningRate 0.0176 Epoch: 23 Global Step: 132090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:10,890-Speed 11636.53 samples/sec Loss 5.5673 LearningRate 0.0176 Epoch: 23 Global Step: 132100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:11,820-Speed 11022.10 samples/sec Loss 5.7151 LearningRate 0.0176 Epoch: 23 Global Step: 132110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:12,721-Speed 11370.81 samples/sec Loss 5.5119 LearningRate 0.0176 Epoch: 23 Global Step: 132120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:13,657-Speed 10950.96 samples/sec Loss 5.6279 LearningRate 0.0176 Epoch: 23 Global Step: 132130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:14,573-Speed 11185.44 samples/sec Loss 5.6783 LearningRate 0.0176 Epoch: 23 Global Step: 132140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:15,464-Speed 11500.56 samples/sec Loss 5.6041 LearningRate 0.0176 Epoch: 23 Global Step: 132150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:16,359-Speed 11447.29 samples/sec Loss 5.6895 LearningRate 0.0175 Epoch: 23 Global Step: 132160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:17,284-Speed 11077.68 samples/sec Loss 5.4751 LearningRate 0.0175 Epoch: 23 Global Step: 132170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:18,236-Speed 10767.38 samples/sec Loss 5.6020 LearningRate 0.0175 Epoch: 23 Global Step: 132180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:19,164-Speed 11044.08 samples/sec Loss 5.6085 LearningRate 0.0175 Epoch: 23 Global Step: 132190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:20,139-Speed 10504.32 samples/sec Loss 5.6435 LearningRate 0.0175 Epoch: 23 Global Step: 132200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:21,026-Speed 11549.22 samples/sec Loss 5.7787 LearningRate 0.0175 Epoch: 23 Global Step: 132210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:21,921-Speed 11455.62 samples/sec Loss 5.6006 LearningRate 0.0175 Epoch: 23 Global Step: 132220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:22,800-Speed 11650.26 samples/sec Loss 5.6787 LearningRate 0.0175 Epoch: 23 Global Step: 132230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:23,709-Speed 11280.87 samples/sec Loss 5.8065 LearningRate 0.0175 Epoch: 23 Global Step: 132240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:24,650-Speed 10891.36 samples/sec Loss 5.6311 LearningRate 0.0175 Epoch: 23 Global Step: 132250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:25,625-Speed 10507.02 samples/sec Loss 5.6407 LearningRate 0.0175 Epoch: 23 Global Step: 132260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:26,538-Speed 11229.92 samples/sec Loss 5.6038 LearningRate 0.0175 Epoch: 23 Global Step: 132270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:27,466-Speed 11042.42 samples/sec Loss 5.6478 LearningRate 0.0175 Epoch: 23 Global Step: 132280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:28,451-Speed 10398.31 samples/sec Loss 5.7189 LearningRate 0.0175 Epoch: 23 Global Step: 132290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:29,343-Speed 11495.14 samples/sec Loss 5.5241 LearningRate 0.0175 Epoch: 23 Global Step: 132300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:30,252-Speed 11272.38 samples/sec Loss 5.6325 LearningRate 0.0175 Epoch: 23 Global Step: 132310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:31,196-Speed 10852.40 samples/sec Loss 5.7440 LearningRate 0.0175 Epoch: 23 Global Step: 132320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:32,124-Speed 11039.31 samples/sec Loss 5.6017 LearningRate 0.0175 Epoch: 23 Global Step: 132330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:33,024-Speed 11383.26 samples/sec Loss 5.5719 LearningRate 0.0175 Epoch: 23 Global Step: 132340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:33,912-Speed 11543.91 samples/sec Loss 5.6698 LearningRate 0.0175 Epoch: 23 Global Step: 132350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:34,818-Speed 11314.46 samples/sec Loss 5.6944 LearningRate 0.0175 Epoch: 23 Global Step: 132360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:35,704-Speed 11558.50 samples/sec Loss 5.7385 LearningRate 0.0175 Epoch: 23 Global Step: 132370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:36,631-Speed 11056.33 samples/sec Loss 5.6755 LearningRate 0.0175 Epoch: 23 Global Step: 132380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:37,560-Speed 11025.80 samples/sec Loss 5.6913 LearningRate 0.0175 Epoch: 23 Global Step: 132390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:38,523-Speed 10644.75 samples/sec Loss 5.8331 LearningRate 0.0175 Epoch: 23 Global Step: 132400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:39,479-Speed 10719.35 samples/sec Loss 5.6918 LearningRate 0.0175 Epoch: 23 Global Step: 132410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:40,373-Speed 11454.31 samples/sec Loss 5.6793 LearningRate 0.0175 Epoch: 23 Global Step: 132420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:41,315-Speed 10885.02 samples/sec Loss 5.5531 LearningRate 0.0175 Epoch: 23 Global Step: 132430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:42,258-Speed 10868.10 samples/sec Loss 5.6835 LearningRate 0.0174 Epoch: 23 Global Step: 132440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:43,219-Speed 10660.28 samples/sec Loss 5.7647 LearningRate 0.0174 Epoch: 23 Global Step: 132450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:44,183-Speed 10631.36 samples/sec Loss 5.6816 LearningRate 0.0174 Epoch: 23 Global Step: 132460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:45,107-Speed 11090.71 samples/sec Loss 5.7772 LearningRate 0.0174 Epoch: 23 Global Step: 132470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:45,996-Speed 11527.78 samples/sec Loss 5.7957 LearningRate 0.0174 Epoch: 23 Global Step: 132480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:46,879-Speed 11606.99 samples/sec Loss 5.6860 LearningRate 0.0174 Epoch: 23 Global Step: 132490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:47,759-Speed 11637.76 samples/sec Loss 5.7131 LearningRate 0.0174 Epoch: 23 Global Step: 132500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:48,673-Speed 11216.05 samples/sec Loss 5.6287 LearningRate 0.0174 Epoch: 23 Global Step: 132510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:49,578-Speed 11325.83 samples/sec Loss 5.7318 LearningRate 0.0174 Epoch: 23 Global Step: 132520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:50,485-Speed 11292.01 samples/sec Loss 5.6329 LearningRate 0.0174 Epoch: 23 Global Step: 132530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:51,376-Speed 11501.77 samples/sec Loss 5.6685 LearningRate 0.0174 Epoch: 23 Global Step: 132540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:52,268-Speed 11490.78 samples/sec Loss 5.8329 LearningRate 0.0174 Epoch: 23 Global Step: 132550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:53,197-Speed 11031.59 samples/sec Loss 5.6767 LearningRate 0.0174 Epoch: 23 Global Step: 132560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:54,151-Speed 10742.86 samples/sec Loss 5.7839 LearningRate 0.0174 Epoch: 23 Global Step: 132570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:55,037-Speed 11553.88 samples/sec Loss 5.5664 LearningRate 0.0174 Epoch: 23 Global Step: 132580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:55,933-Speed 11437.41 samples/sec Loss 5.6956 LearningRate 0.0174 Epoch: 23 Global Step: 132590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:56,815-Speed 11625.33 samples/sec Loss 5.6618 LearningRate 0.0174 Epoch: 23 Global Step: 132600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:56:57,695-Speed 11642.80 samples/sec Loss 5.6742 LearningRate 0.0174 Epoch: 23 Global Step: 132610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:58,579-Speed 11591.96 samples/sec Loss 5.8344 LearningRate 0.0174 Epoch: 23 Global Step: 132620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:56:59,465-Speed 11558.50 samples/sec Loss 5.7011 LearningRate 0.0174 Epoch: 23 Global Step: 132630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:00,361-Speed 11441.53 samples/sec Loss 5.7496 LearningRate 0.0174 Epoch: 23 Global Step: 132640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:01,262-Speed 11385.10 samples/sec Loss 5.6030 LearningRate 0.0174 Epoch: 23 Global Step: 132650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:02,217-Speed 10722.77 samples/sec Loss 5.7599 LearningRate 0.0174 Epoch: 23 Global Step: 132660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:03,172-Speed 10731.76 samples/sec Loss 5.7569 LearningRate 0.0174 Epoch: 23 Global Step: 132670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:04,121-Speed 10795.57 samples/sec Loss 5.6621 LearningRate 0.0174 Epoch: 23 Global Step: 132680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:05,033-Speed 11238.57 samples/sec Loss 5.6779 LearningRate 0.0174 Epoch: 23 Global Step: 132690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:05,976-Speed 10867.16 samples/sec Loss 5.7261 LearningRate 0.0174 Epoch: 23 Global Step: 132700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:06,899-Speed 11105.17 samples/sec Loss 5.7376 LearningRate 0.0173 Epoch: 23 Global Step: 132710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:07,809-Speed 11271.06 samples/sec Loss 5.6111 LearningRate 0.0173 Epoch: 23 Global Step: 132720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:08,726-Speed 11170.45 samples/sec Loss 5.6906 LearningRate 0.0173 Epoch: 23 Global Step: 132730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:09,637-Speed 11244.26 samples/sec Loss 5.7020 LearningRate 0.0173 Epoch: 23 Global Step: 132740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:10,525-Speed 11541.76 samples/sec Loss 5.8345 LearningRate 0.0173 Epoch: 23 Global Step: 132750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:11,421-Speed 11443.93 samples/sec Loss 5.8380 LearningRate 0.0173 Epoch: 23 Global Step: 132760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:12,315-Speed 11458.70 samples/sec Loss 5.7458 LearningRate 0.0173 Epoch: 23 Global Step: 132770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:13,214-Speed 11390.50 samples/sec Loss 5.7143 LearningRate 0.0173 Epoch: 23 Global Step: 132780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:14,103-Speed 11526.00 samples/sec Loss 5.6952 LearningRate 0.0173 Epoch: 23 Global Step: 132790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:15,047-Speed 10862.79 samples/sec Loss 5.6027 LearningRate 0.0173 Epoch: 23 Global Step: 132800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:15,995-Speed 10812.59 samples/sec Loss 5.7550 LearningRate 0.0173 Epoch: 23 Global Step: 132810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:16,892-Speed 11417.31 samples/sec Loss 5.8556 LearningRate 0.0173 Epoch: 23 Global Step: 132820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:17,841-Speed 10797.79 samples/sec Loss 5.9119 LearningRate 0.0173 Epoch: 23 Global Step: 132830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:18,775-Speed 10975.92 samples/sec Loss 5.6446 LearningRate 0.0173 Epoch: 23 Global Step: 132840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:19,689-Speed 11208.66 samples/sec Loss 5.7415 LearningRate 0.0173 Epoch: 23 Global Step: 132850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:20,620-Speed 10999.02 samples/sec Loss 5.7305 LearningRate 0.0173 Epoch: 23 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:21,509-Speed 11535.10 samples/sec Loss 5.7434 LearningRate 0.0173 Epoch: 23 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:22,412-Speed 11344.48 samples/sec Loss 5.6268 LearningRate 0.0173 Epoch: 23 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:23,319-Speed 11295.59 samples/sec Loss 5.6329 LearningRate 0.0173 Epoch: 23 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:24,236-Speed 11181.88 samples/sec Loss 5.7242 LearningRate 0.0173 Epoch: 23 Global Step: 132900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:25,152-Speed 11175.83 samples/sec Loss 5.7129 LearningRate 0.0173 Epoch: 23 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:26,056-Speed 11349.54 samples/sec Loss 5.7979 LearningRate 0.0173 Epoch: 23 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:27,021-Speed 10612.29 samples/sec Loss 5.6424 LearningRate 0.0173 Epoch: 23 Global Step: 132930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:27,915-Speed 11466.96 samples/sec Loss 5.7415 LearningRate 0.0173 Epoch: 23 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:28,835-Speed 11140.71 samples/sec Loss 5.7922 LearningRate 0.0173 Epoch: 23 Global Step: 132950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:29,735-Speed 11375.25 samples/sec Loss 5.9089 LearningRate 0.0173 Epoch: 23 Global Step: 132960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:30,658-Speed 11104.84 samples/sec Loss 5.6236 LearningRate 0.0173 Epoch: 23 Global Step: 132970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:31,594-Speed 10954.46 samples/sec Loss 5.6591 LearningRate 0.0172 Epoch: 23 Global Step: 132980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:32,496-Speed 11357.83 samples/sec Loss 5.8250 LearningRate 0.0172 Epoch: 23 Global Step: 132990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:33,390-Speed 11470.98 samples/sec Loss 5.6426 LearningRate 0.0172 Epoch: 23 Global Step: 133000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:34,282-Speed 11479.20 samples/sec Loss 5.8264 LearningRate 0.0172 Epoch: 23 Global Step: 133010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:35,194-Speed 11237.24 samples/sec Loss 5.7291 LearningRate 0.0172 Epoch: 23 Global Step: 133020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:36,085-Speed 11505.43 samples/sec Loss 5.7762 LearningRate 0.0172 Epoch: 23 Global Step: 133030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:36,971-Speed 11563.68 samples/sec Loss 5.7117 LearningRate 0.0172 Epoch: 23 Global Step: 133040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:37,861-Speed 11511.62 samples/sec Loss 5.8287 LearningRate 0.0172 Epoch: 23 Global Step: 133050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:38,766-Speed 11317.22 samples/sec Loss 5.7112 LearningRate 0.0172 Epoch: 23 Global Step: 133060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:39,663-Speed 11434.95 samples/sec Loss 5.6958 LearningRate 0.0172 Epoch: 23 Global Step: 133070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:40,551-Speed 11530.44 samples/sec Loss 5.5787 LearningRate 0.0172 Epoch: 23 Global Step: 133080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:41,482-Speed 11014.31 samples/sec Loss 5.7425 LearningRate 0.0172 Epoch: 23 Global Step: 133090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:42,429-Speed 10818.93 samples/sec Loss 5.6228 LearningRate 0.0172 Epoch: 23 Global Step: 133100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:57:43,371-Speed 10869.34 samples/sec Loss 5.7099 LearningRate 0.0172 Epoch: 23 Global Step: 133110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:44,290-Speed 11160.50 samples/sec Loss 5.8924 LearningRate 0.0172 Epoch: 23 Global Step: 133120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:45,190-Speed 11384.81 samples/sec Loss 5.8447 LearningRate 0.0172 Epoch: 23 Global Step: 133130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:46,090-Speed 11382.94 samples/sec Loss 5.7373 LearningRate 0.0172 Epoch: 23 Global Step: 133140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:46,987-Speed 11428.48 samples/sec Loss 5.7780 LearningRate 0.0172 Epoch: 23 Global Step: 133150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:47,886-Speed 11398.74 samples/sec Loss 5.7292 LearningRate 0.0172 Epoch: 23 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:48,818-Speed 10987.48 samples/sec Loss 5.7065 LearningRate 0.0172 Epoch: 23 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:49,722-Speed 11331.29 samples/sec Loss 5.7357 LearningRate 0.0172 Epoch: 23 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:50,647-Speed 11076.02 samples/sec Loss 5.6968 LearningRate 0.0172 Epoch: 23 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:51,610-Speed 10652.38 samples/sec Loss 5.6790 LearningRate 0.0172 Epoch: 23 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:52,544-Speed 10965.99 samples/sec Loss 5.7924 LearningRate 0.0172 Epoch: 23 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:53,488-Speed 10856.83 samples/sec Loss 5.6482 LearningRate 0.0172 Epoch: 23 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:54,419-Speed 11006.42 samples/sec Loss 5.7407 LearningRate 0.0172 Epoch: 23 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:55,366-Speed 10827.07 samples/sec Loss 5.6826 LearningRate 0.0172 Epoch: 23 Global Step: 133240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:57:56,262-Speed 11435.17 samples/sec Loss 5.7923 LearningRate 0.0172 Epoch: 23 Global Step: 133250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:57:57,150-Speed 11544.07 samples/sec Loss 5.6348 LearningRate 0.0171 Epoch: 23 Global Step: 133260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:57:58,058-Speed 11284.02 samples/sec Loss 5.7843 LearningRate 0.0171 Epoch: 23 Global Step: 133270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:57:58,951-Speed 11476.95 samples/sec Loss 5.6892 LearningRate 0.0171 Epoch: 23 Global Step: 133280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:57:59,869-Speed 11163.07 samples/sec Loss 5.7766 LearningRate 0.0171 Epoch: 23 Global Step: 133290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:00,761-Speed 11487.41 samples/sec Loss 5.9720 LearningRate 0.0171 Epoch: 23 Global Step: 133300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:01,663-Speed 11351.82 samples/sec Loss 5.7183 LearningRate 0.0171 Epoch: 23 Global Step: 133310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:02,566-Speed 11346.05 samples/sec Loss 5.7219 LearningRate 0.0171 Epoch: 23 Global Step: 133320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:03,515-Speed 10797.45 samples/sec Loss 5.6983 LearningRate 0.0171 Epoch: 23 Global Step: 133330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:04,450-Speed 10967.71 samples/sec Loss 5.7590 LearningRate 0.0171 Epoch: 23 Global Step: 133340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:05,379-Speed 11027.94 samples/sec Loss 5.7218 LearningRate 0.0171 Epoch: 23 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:06,279-Speed 11386.03 samples/sec Loss 5.7649 LearningRate 0.0171 Epoch: 23 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:07,220-Speed 10887.27 samples/sec Loss 5.7806 LearningRate 0.0171 Epoch: 23 Global Step: 133370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:08,133-Speed 11223.57 samples/sec Loss 6.0067 LearningRate 0.0171 Epoch: 23 Global Step: 133380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:09,038-Speed 11318.08 samples/sec Loss 5.7191 LearningRate 0.0171 Epoch: 23 Global Step: 133390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:09,941-Speed 11362.71 samples/sec Loss 5.6182 LearningRate 0.0171 Epoch: 23 Global Step: 133400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:10,829-Speed 11542.93 samples/sec Loss 5.7417 LearningRate 0.0171 Epoch: 23 Global Step: 133410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:11,716-Speed 11555.19 samples/sec Loss 5.8186 LearningRate 0.0171 Epoch: 23 Global Step: 133420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:12,666-Speed 10785.16 samples/sec Loss 5.7713 LearningRate 0.0171 Epoch: 23 Global Step: 133430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:13,627-Speed 10657.19 samples/sec Loss 5.7118 LearningRate 0.0171 Epoch: 23 Global Step: 133440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:14,569-Speed 10879.00 samples/sec Loss 5.7005 LearningRate 0.0171 Epoch: 23 Global Step: 133450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:15,519-Speed 10786.84 samples/sec Loss 5.6577 LearningRate 0.0171 Epoch: 23 Global Step: 133460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:16,461-Speed 10889.32 samples/sec Loss 5.7432 LearningRate 0.0171 Epoch: 23 Global Step: 133470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:17,389-Speed 11038.44 samples/sec Loss 5.7533 LearningRate 0.0171 Epoch: 23 Global Step: 133480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:18,328-Speed 10918.12 samples/sec Loss 5.6892 LearningRate 0.0171 Epoch: 23 Global Step: 133490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:19,277-Speed 10796.60 samples/sec Loss 5.6449 LearningRate 0.0171 Epoch: 23 Global Step: 133500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:20,209-Speed 10992.37 samples/sec Loss 5.5650 LearningRate 0.0171 Epoch: 23 Global Step: 133510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:21,118-Speed 11272.67 samples/sec Loss 5.8055 LearningRate 0.0171 Epoch: 23 Global Step: 133520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:22,012-Speed 11472.17 samples/sec Loss 5.7877 LearningRate 0.0170 Epoch: 23 Global Step: 133530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:22,910-Speed 11401.28 samples/sec Loss 5.8676 LearningRate 0.0170 Epoch: 23 Global Step: 133540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:23,855-Speed 10847.22 samples/sec Loss 5.7082 LearningRate 0.0170 Epoch: 23 Global Step: 133550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:24,795-Speed 10899.76 samples/sec Loss 5.7925 LearningRate 0.0170 Epoch: 23 Global Step: 133560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:25,745-Speed 10786.57 samples/sec Loss 5.5947 LearningRate 0.0170 Epoch: 23 Global Step: 133570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:26,663-Speed 11166.58 samples/sec Loss 5.6946 LearningRate 0.0170 Epoch: 23 Global Step: 133580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:27,571-Speed 11285.56 samples/sec Loss 5.7530 LearningRate 0.0170 Epoch: 23 Global Step: 133590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:28,464-Speed 11475.69 samples/sec Loss 5.7552 LearningRate 0.0170 Epoch: 23 Global Step: 133600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:29,374-Speed 11251.93 samples/sec Loss 5.7904 LearningRate 0.0170 Epoch: 23 Global Step: 133610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:30,256-Speed 11614.47 samples/sec Loss 5.8718 LearningRate 0.0170 Epoch: 23 Global Step: 133620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:31,158-Speed 11368.80 samples/sec Loss 5.7646 LearningRate 0.0170 Epoch: 23 Global Step: 133630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:32,059-Speed 11374.25 samples/sec Loss 5.6852 LearningRate 0.0170 Epoch: 23 Global Step: 133640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:32,951-Speed 11482.71 samples/sec Loss 5.7533 LearningRate 0.0170 Epoch: 23 Global Step: 133650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:33,835-Speed 11598.20 samples/sec Loss 5.8148 LearningRate 0.0170 Epoch: 23 Global Step: 133660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:34,714-Speed 11663.39 samples/sec Loss 5.9863 LearningRate 0.0170 Epoch: 23 Global Step: 133670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:35,613-Speed 11387.30 samples/sec Loss 5.7679 LearningRate 0.0170 Epoch: 23 Global Step: 133680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:36,554-Speed 10897.83 samples/sec Loss 5.8269 LearningRate 0.0170 Epoch: 23 Global Step: 133690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:37,479-Speed 11071.75 samples/sec Loss 5.8513 LearningRate 0.0170 Epoch: 23 Global Step: 133700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:38,401-Speed 11123.67 samples/sec Loss 5.7370 LearningRate 0.0170 Epoch: 23 Global Step: 133710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:39,356-Speed 10729.86 samples/sec Loss 5.8210 LearningRate 0.0170 Epoch: 23 Global Step: 133720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:40,283-Speed 11054.78 samples/sec Loss 5.6953 LearningRate 0.0170 Epoch: 23 Global Step: 133730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:41,201-Speed 11161.70 samples/sec Loss 5.8531 LearningRate 0.0170 Epoch: 23 Global Step: 133740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:42,121-Speed 11137.98 samples/sec Loss 5.7035 LearningRate 0.0170 Epoch: 23 Global Step: 133750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:43,069-Speed 10805.88 samples/sec Loss 5.8174 LearningRate 0.0170 Epoch: 23 Global Step: 133760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:44,003-Speed 10974.22 samples/sec Loss 5.8435 LearningRate 0.0170 Epoch: 23 Global Step: 133770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:44,894-Speed 11498.47 samples/sec Loss 5.7718 LearningRate 0.0170 Epoch: 23 Global Step: 133780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:45,792-Speed 11414.57 samples/sec Loss 5.7422 LearningRate 0.0170 Epoch: 23 Global Step: 133790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:46,681-Speed 11532.43 samples/sec Loss 5.6687 LearningRate 0.0170 Epoch: 23 Global Step: 133800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:47,609-Speed 11040.33 samples/sec Loss 5.7640 LearningRate 0.0169 Epoch: 23 Global Step: 133810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:48,549-Speed 10896.93 samples/sec Loss 5.6711 LearningRate 0.0169 Epoch: 23 Global Step: 133820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:49,470-Speed 11119.73 samples/sec Loss 5.6687 LearningRate 0.0169 Epoch: 23 Global Step: 133830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:50,442-Speed 10548.38 samples/sec Loss 5.7810 LearningRate 0.0169 Epoch: 23 Global Step: 133840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:51,426-Speed 10412.72 samples/sec Loss 5.7877 LearningRate 0.0169 Epoch: 23 Global Step: 133850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:52,333-Speed 11301.37 samples/sec Loss 5.8652 LearningRate 0.0169 Epoch: 23 Global Step: 133860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:53,222-Speed 11515.67 samples/sec Loss 5.8961 LearningRate 0.0169 Epoch: 23 Global Step: 133870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:54,123-Speed 11373.29 samples/sec Loss 5.8210 LearningRate 0.0169 Epoch: 23 Global Step: 133880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:55,056-Speed 11138.43 samples/sec Loss 5.8072 LearningRate 0.0169 Epoch: 23 Global Step: 133890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 03:58:55,944-Speed 11533.14 samples/sec Loss 5.7412 LearningRate 0.0169 Epoch: 23 Global Step: 133900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:56,861-Speed 11183.84 samples/sec Loss 5.8896 LearningRate 0.0169 Epoch: 23 Global Step: 133910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:57,752-Speed 11491.03 samples/sec Loss 5.7913 LearningRate 0.0169 Epoch: 23 Global Step: 133920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 03:58:58,662-Speed 11262.21 samples/sec Loss 5.6233 LearningRate 0.0169 Epoch: 23 Global Step: 133930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:58:59,568-Speed 11310.58 samples/sec Loss 5.7368 LearningRate 0.0169 Epoch: 23 Global Step: 133940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:00,479-Speed 11251.95 samples/sec Loss 5.6761 LearningRate 0.0169 Epoch: 23 Global Step: 133950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:01,367-Speed 11550.18 samples/sec Loss 5.6027 LearningRate 0.0169 Epoch: 23 Global Step: 133960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:02,300-Speed 10977.54 samples/sec Loss 5.7389 LearningRate 0.0169 Epoch: 23 Global Step: 133970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:03,214-Speed 11214.39 samples/sec Loss 5.7376 LearningRate 0.0169 Epoch: 23 Global Step: 133980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:04,161-Speed 10819.82 samples/sec Loss 5.7359 LearningRate 0.0169 Epoch: 23 Global Step: 133990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:05,061-Speed 11387.38 samples/sec Loss 5.8906 LearningRate 0.0169 Epoch: 23 Global Step: 134000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 03:59:26,379-[lfw][134000]XNorm: 9.658340 Training: 2022-04-11 03:59:26,380-[lfw][134000]Accuracy-Flip: 0.99600+-0.00335 Training: 2022-04-11 03:59:26,380-[lfw][134000]Accuracy-Highest: 0.99650 Training: 2022-04-11 03:59:51,016-[cfp_fp][134000]XNorm: 8.191262 Training: 2022-04-11 03:59:51,016-[cfp_fp][134000]Accuracy-Flip: 0.94686+-0.01287 Training: 2022-04-11 03:59:51,016-[cfp_fp][134000]Accuracy-Highest: 0.95171 Training: 2022-04-11 04:00:12,263-[agedb_30][134000]XNorm: 9.479216 Training: 2022-04-11 04:00:12,264-[agedb_30][134000]Accuracy-Flip: 0.96767+-0.00803 Training: 2022-04-11 04:00:12,264-[agedb_30][134000]Accuracy-Highest: 0.96783 Training: 2022-04-11 04:00:13,169-Speed 150.35 samples/sec Loss 5.8362 LearningRate 0.0169 Epoch: 23 Global Step: 134010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:00:14,076-Speed 11292.83 samples/sec Loss 5.6973 LearningRate 0.0169 Epoch: 23 Global Step: 134020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:00:15,012-Speed 10952.10 samples/sec Loss 5.8038 LearningRate 0.0169 Epoch: 23 Global Step: 134030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:15,950-Speed 10917.87 samples/sec Loss 5.7199 LearningRate 0.0169 Epoch: 23 Global Step: 134040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:16,890-Speed 10905.64 samples/sec Loss 5.8791 LearningRate 0.0169 Epoch: 23 Global Step: 134050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:17,816-Speed 11058.30 samples/sec Loss 5.8817 LearningRate 0.0169 Epoch: 23 Global Step: 134060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:18,764-Speed 10816.60 samples/sec Loss 5.8564 LearningRate 0.0169 Epoch: 23 Global Step: 134070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:19,706-Speed 10872.94 samples/sec Loss 5.7673 LearningRate 0.0168 Epoch: 23 Global Step: 134080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:20,583-Speed 11691.07 samples/sec Loss 5.9232 LearningRate 0.0168 Epoch: 23 Global Step: 134090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:21,458-Speed 11707.91 samples/sec Loss 5.7226 LearningRate 0.0168 Epoch: 23 Global Step: 134100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:22,370-Speed 11237.15 samples/sec Loss 5.8394 LearningRate 0.0168 Epoch: 23 Global Step: 134110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:23,282-Speed 11233.98 samples/sec Loss 5.7837 LearningRate 0.0168 Epoch: 23 Global Step: 134120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:24,189-Speed 11299.49 samples/sec Loss 5.7901 LearningRate 0.0168 Epoch: 23 Global Step: 134130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:25,114-Speed 11079.50 samples/sec Loss 5.7666 LearningRate 0.0168 Epoch: 23 Global Step: 134140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:26,039-Speed 11070.49 samples/sec Loss 5.8513 LearningRate 0.0168 Epoch: 23 Global Step: 134150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:26,951-Speed 11238.96 samples/sec Loss 5.9062 LearningRate 0.0168 Epoch: 23 Global Step: 134160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:27,852-Speed 11375.25 samples/sec Loss 5.6760 LearningRate 0.0168 Epoch: 23 Global Step: 134170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:28,762-Speed 11265.50 samples/sec Loss 5.7845 LearningRate 0.0168 Epoch: 23 Global Step: 134180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:29,697-Speed 10951.70 samples/sec Loss 5.6566 LearningRate 0.0168 Epoch: 23 Global Step: 134190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:30,633-Speed 10954.80 samples/sec Loss 5.6431 LearningRate 0.0168 Epoch: 23 Global Step: 134200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:31,565-Speed 10998.00 samples/sec Loss 5.7579 LearningRate 0.0168 Epoch: 23 Global Step: 134210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:32,470-Speed 11319.17 samples/sec Loss 5.8294 LearningRate 0.0168 Epoch: 23 Global Step: 134220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:33,371-Speed 11377.31 samples/sec Loss 5.8958 LearningRate 0.0168 Epoch: 23 Global Step: 134230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:34,271-Speed 11379.43 samples/sec Loss 5.8200 LearningRate 0.0168 Epoch: 23 Global Step: 134240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:35,170-Speed 11396.30 samples/sec Loss 5.7047 LearningRate 0.0168 Epoch: 23 Global Step: 134250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:36,063-Speed 11472.62 samples/sec Loss 5.9070 LearningRate 0.0168 Epoch: 23 Global Step: 134260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:36,958-Speed 11455.36 samples/sec Loss 5.7855 LearningRate 0.0168 Epoch: 23 Global Step: 134270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:37,907-Speed 10797.95 samples/sec Loss 5.6871 LearningRate 0.0168 Epoch: 23 Global Step: 134280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:38,813-Speed 11310.82 samples/sec Loss 5.9586 LearningRate 0.0168 Epoch: 23 Global Step: 134290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:39,752-Speed 10905.88 samples/sec Loss 5.6556 LearningRate 0.0168 Epoch: 23 Global Step: 134300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:00:40,642-Speed 11512.21 samples/sec Loss 5.8466 LearningRate 0.0168 Epoch: 23 Global Step: 134310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:41,549-Speed 11301.58 samples/sec Loss 5.9587 LearningRate 0.0168 Epoch: 23 Global Step: 134320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:42,455-Speed 11310.64 samples/sec Loss 5.7459 LearningRate 0.0168 Epoch: 23 Global Step: 134330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:43,360-Speed 11332.11 samples/sec Loss 5.7735 LearningRate 0.0168 Epoch: 23 Global Step: 134340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:44,321-Speed 10662.34 samples/sec Loss 5.8172 LearningRate 0.0168 Epoch: 23 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:45,197-Speed 11692.90 samples/sec Loss 5.9265 LearningRate 0.0167 Epoch: 23 Global Step: 134360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:46,099-Speed 11358.69 samples/sec Loss 5.7616 LearningRate 0.0167 Epoch: 23 Global Step: 134370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:47,014-Speed 11211.03 samples/sec Loss 5.7539 LearningRate 0.0167 Epoch: 23 Global Step: 134380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:47,949-Speed 10959.00 samples/sec Loss 5.8389 LearningRate 0.0167 Epoch: 23 Global Step: 134390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:48,856-Speed 11296.97 samples/sec Loss 5.8718 LearningRate 0.0167 Epoch: 23 Global Step: 134400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:49,767-Speed 11240.95 samples/sec Loss 5.7808 LearningRate 0.0167 Epoch: 23 Global Step: 134410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:50,702-Speed 10955.64 samples/sec Loss 5.7753 LearningRate 0.0167 Epoch: 23 Global Step: 134420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:51,629-Speed 11063.40 samples/sec Loss 5.6952 LearningRate 0.0167 Epoch: 23 Global Step: 134430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:52,589-Speed 10671.16 samples/sec Loss 5.7842 LearningRate 0.0167 Epoch: 23 Global Step: 134440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:53,497-Speed 11290.36 samples/sec Loss 5.9402 LearningRate 0.0167 Epoch: 23 Global Step: 134450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:54,432-Speed 10951.07 samples/sec Loss 5.7484 LearningRate 0.0167 Epoch: 23 Global Step: 134460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:55,338-Speed 11310.04 samples/sec Loss 5.8760 LearningRate 0.0167 Epoch: 23 Global Step: 134470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:00:56,254-Speed 11184.39 samples/sec Loss 5.8565 LearningRate 0.0167 Epoch: 23 Global Step: 134480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:00:57,156-Speed 11363.19 samples/sec Loss 5.8506 LearningRate 0.0167 Epoch: 23 Global Step: 134490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:00:58,056-Speed 11386.55 samples/sec Loss 5.7585 LearningRate 0.0167 Epoch: 23 Global Step: 134500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:00:58,942-Speed 11568.05 samples/sec Loss 5.7403 LearningRate 0.0167 Epoch: 23 Global Step: 134510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:00:59,867-Speed 11078.35 samples/sec Loss 5.7569 LearningRate 0.0167 Epoch: 23 Global Step: 134520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:00,754-Speed 11550.92 samples/sec Loss 5.7197 LearningRate 0.0167 Epoch: 23 Global Step: 134530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:01,663-Speed 11275.05 samples/sec Loss 5.7815 LearningRate 0.0167 Epoch: 23 Global Step: 134540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:02,559-Speed 11436.54 samples/sec Loss 5.7652 LearningRate 0.0167 Epoch: 23 Global Step: 134550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:03,454-Speed 11443.37 samples/sec Loss 5.8354 LearningRate 0.0167 Epoch: 23 Global Step: 134560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:04,374-Speed 11146.27 samples/sec Loss 5.7784 LearningRate 0.0167 Epoch: 23 Global Step: 134570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:05,296-Speed 11108.43 samples/sec Loss 5.7486 LearningRate 0.0167 Epoch: 23 Global Step: 134580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:06,202-Speed 11333.53 samples/sec Loss 5.8639 LearningRate 0.0167 Epoch: 23 Global Step: 134590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:07,123-Speed 11122.54 samples/sec Loss 5.7662 LearningRate 0.0167 Epoch: 23 Global Step: 134600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:08,051-Speed 11043.02 samples/sec Loss 5.8182 LearningRate 0.0167 Epoch: 23 Global Step: 134610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:08,980-Speed 11028.73 samples/sec Loss 5.7073 LearningRate 0.0167 Epoch: 23 Global Step: 134620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:09,873-Speed 11475.53 samples/sec Loss 5.8559 LearningRate 0.0167 Epoch: 23 Global Step: 134630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:10,781-Speed 11294.24 samples/sec Loss 5.7702 LearningRate 0.0166 Epoch: 23 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:11,702-Speed 11120.88 samples/sec Loss 5.7936 LearningRate 0.0166 Epoch: 23 Global Step: 134650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:12,637-Speed 10964.06 samples/sec Loss 5.8157 LearningRate 0.0166 Epoch: 23 Global Step: 134660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:13,572-Speed 10960.33 samples/sec Loss 5.6907 LearningRate 0.0166 Epoch: 23 Global Step: 134670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:14,478-Speed 11305.40 samples/sec Loss 5.9793 LearningRate 0.0166 Epoch: 23 Global Step: 134680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:15,405-Speed 11048.12 samples/sec Loss 5.7123 LearningRate 0.0166 Epoch: 23 Global Step: 134690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:16,298-Speed 11483.00 samples/sec Loss 5.8166 LearningRate 0.0166 Epoch: 23 Global Step: 134700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:17,211-Speed 11222.62 samples/sec Loss 5.7738 LearningRate 0.0166 Epoch: 23 Global Step: 134710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:18,140-Speed 11033.50 samples/sec Loss 5.8149 LearningRate 0.0166 Epoch: 23 Global Step: 134720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:19,035-Speed 11442.62 samples/sec Loss 6.0001 LearningRate 0.0166 Epoch: 23 Global Step: 134730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:19,927-Speed 11487.29 samples/sec Loss 5.7819 LearningRate 0.0166 Epoch: 23 Global Step: 134740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:20,843-Speed 11182.69 samples/sec Loss 5.8564 LearningRate 0.0166 Epoch: 23 Global Step: 134750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:21,723-Speed 11649.44 samples/sec Loss 5.7079 LearningRate 0.0166 Epoch: 23 Global Step: 134760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:22,621-Speed 11408.05 samples/sec Loss 5.8101 LearningRate 0.0166 Epoch: 23 Global Step: 134770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:23,499-Speed 11681.32 samples/sec Loss 5.6869 LearningRate 0.0166 Epoch: 23 Global Step: 134780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:24,421-Speed 11110.13 samples/sec Loss 5.7540 LearningRate 0.0166 Epoch: 23 Global Step: 134790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:25,372-Speed 10770.14 samples/sec Loss 5.7563 LearningRate 0.0166 Epoch: 23 Global Step: 134800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:26,291-Speed 11151.65 samples/sec Loss 5.7688 LearningRate 0.0166 Epoch: 23 Global Step: 134810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:27,223-Speed 11011.95 samples/sec Loss 5.8301 LearningRate 0.0166 Epoch: 23 Global Step: 134820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:28,160-Speed 10934.10 samples/sec Loss 5.8908 LearningRate 0.0166 Epoch: 23 Global Step: 134830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:29,070-Speed 11258.55 samples/sec Loss 5.7389 LearningRate 0.0166 Epoch: 23 Global Step: 134840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:29,971-Speed 11379.68 samples/sec Loss 5.8054 LearningRate 0.0166 Epoch: 23 Global Step: 134850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:30,897-Speed 11067.20 samples/sec Loss 5.9146 LearningRate 0.0166 Epoch: 23 Global Step: 134860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:31,817-Speed 11134.14 samples/sec Loss 5.8481 LearningRate 0.0166 Epoch: 23 Global Step: 134870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:32,700-Speed 11604.59 samples/sec Loss 5.8361 LearningRate 0.0166 Epoch: 23 Global Step: 134880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:33,594-Speed 11462.81 samples/sec Loss 5.8834 LearningRate 0.0166 Epoch: 23 Global Step: 134890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:34,507-Speed 11225.10 samples/sec Loss 5.8792 LearningRate 0.0166 Epoch: 23 Global Step: 134900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:35,400-Speed 11474.53 samples/sec Loss 5.8358 LearningRate 0.0166 Epoch: 23 Global Step: 134910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:36,293-Speed 11477.87 samples/sec Loss 5.9095 LearningRate 0.0165 Epoch: 23 Global Step: 134920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:37,203-Speed 11253.40 samples/sec Loss 5.8334 LearningRate 0.0165 Epoch: 23 Global Step: 134930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:38,131-Speed 11049.25 samples/sec Loss 5.7745 LearningRate 0.0165 Epoch: 23 Global Step: 134940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:39,085-Speed 10742.80 samples/sec Loss 5.7808 LearningRate 0.0165 Epoch: 23 Global Step: 134950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:39,996-Speed 11242.19 samples/sec Loss 5.7426 LearningRate 0.0165 Epoch: 23 Global Step: 134960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:40,939-Speed 10863.57 samples/sec Loss 5.7142 LearningRate 0.0165 Epoch: 23 Global Step: 134970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:41,829-Speed 11519.43 samples/sec Loss 5.9283 LearningRate 0.0165 Epoch: 23 Global Step: 134980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:42,721-Speed 11483.25 samples/sec Loss 5.8079 LearningRate 0.0165 Epoch: 23 Global Step: 134990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:43,639-Speed 11160.52 samples/sec Loss 5.8237 LearningRate 0.0165 Epoch: 23 Global Step: 135000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:01:44,551-Speed 11235.26 samples/sec Loss 5.7737 LearningRate 0.0165 Epoch: 23 Global Step: 135010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:45,437-Speed 11574.12 samples/sec Loss 5.9048 LearningRate 0.0165 Epoch: 23 Global Step: 135020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:46,338-Speed 11368.26 samples/sec Loss 5.7451 LearningRate 0.0165 Epoch: 23 Global Step: 135030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:47,223-Speed 11581.21 samples/sec Loss 5.8554 LearningRate 0.0165 Epoch: 23 Global Step: 135040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:48,140-Speed 11181.43 samples/sec Loss 5.8509 LearningRate 0.0165 Epoch: 23 Global Step: 135050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:49,056-Speed 11189.22 samples/sec Loss 5.7597 LearningRate 0.0165 Epoch: 23 Global Step: 135060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:50,002-Speed 10825.72 samples/sec Loss 5.7521 LearningRate 0.0165 Epoch: 23 Global Step: 135070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:50,915-Speed 11227.73 samples/sec Loss 5.7070 LearningRate 0.0165 Epoch: 23 Global Step: 135080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:51,888-Speed 10531.58 samples/sec Loss 5.7701 LearningRate 0.0165 Epoch: 23 Global Step: 135090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:52,865-Speed 10492.38 samples/sec Loss 5.8265 LearningRate 0.0165 Epoch: 23 Global Step: 135100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:53,796-Speed 11001.88 samples/sec Loss 5.6935 LearningRate 0.0165 Epoch: 23 Global Step: 135110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:54,688-Speed 11490.89 samples/sec Loss 5.8711 LearningRate 0.0165 Epoch: 23 Global Step: 135120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:55,639-Speed 10776.13 samples/sec Loss 5.9845 LearningRate 0.0165 Epoch: 23 Global Step: 135130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:56,536-Speed 11450.75 samples/sec Loss 5.7509 LearningRate 0.0165 Epoch: 23 Global Step: 135140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:57,429-Speed 11470.00 samples/sec Loss 5.8741 LearningRate 0.0165 Epoch: 23 Global Step: 135150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:01:58,302-Speed 11736.59 samples/sec Loss 5.7434 LearningRate 0.0165 Epoch: 23 Global Step: 135160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:01:59,197-Speed 11445.66 samples/sec Loss 5.7593 LearningRate 0.0165 Epoch: 23 Global Step: 135170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:00,096-Speed 11422.25 samples/sec Loss 5.9419 LearningRate 0.0165 Epoch: 23 Global Step: 135180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:01,020-Speed 11085.08 samples/sec Loss 5.9068 LearningRate 0.0165 Epoch: 23 Global Step: 135190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:01,907-Speed 11552.16 samples/sec Loss 5.8340 LearningRate 0.0164 Epoch: 23 Global Step: 135200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:02,820-Speed 11230.99 samples/sec Loss 5.7029 LearningRate 0.0164 Epoch: 23 Global Step: 135210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:03,760-Speed 10893.80 samples/sec Loss 5.9542 LearningRate 0.0164 Epoch: 23 Global Step: 135220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:04,662-Speed 11365.15 samples/sec Loss 5.8031 LearningRate 0.0164 Epoch: 23 Global Step: 135230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:05,543-Speed 11634.27 samples/sec Loss 5.9085 LearningRate 0.0164 Epoch: 23 Global Step: 135240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:06,464-Speed 11119.06 samples/sec Loss 5.8120 LearningRate 0.0164 Epoch: 23 Global Step: 135250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:07,371-Speed 11307.92 samples/sec Loss 5.8926 LearningRate 0.0164 Epoch: 23 Global Step: 135260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:08,271-Speed 11382.23 samples/sec Loss 5.8254 LearningRate 0.0164 Epoch: 23 Global Step: 135270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:09,168-Speed 11419.84 samples/sec Loss 5.7802 LearningRate 0.0164 Epoch: 23 Global Step: 135280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:10,055-Speed 11556.05 samples/sec Loss 5.8106 LearningRate 0.0164 Epoch: 23 Global Step: 135290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:10,955-Speed 11389.37 samples/sec Loss 5.8584 LearningRate 0.0164 Epoch: 23 Global Step: 135300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:11,884-Speed 11025.55 samples/sec Loss 5.8762 LearningRate 0.0164 Epoch: 23 Global Step: 135310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:12,809-Speed 11082.14 samples/sec Loss 5.8249 LearningRate 0.0164 Epoch: 23 Global Step: 135320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:13,735-Speed 11063.67 samples/sec Loss 5.7521 LearningRate 0.0164 Epoch: 23 Global Step: 135330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:14,693-Speed 10697.40 samples/sec Loss 5.8999 LearningRate 0.0164 Epoch: 23 Global Step: 135340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:15,619-Speed 11069.65 samples/sec Loss 5.7871 LearningRate 0.0164 Epoch: 23 Global Step: 135350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:16,537-Speed 11168.04 samples/sec Loss 5.9138 LearningRate 0.0164 Epoch: 23 Global Step: 135360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:17,459-Speed 11103.23 samples/sec Loss 5.8057 LearningRate 0.0164 Epoch: 23 Global Step: 135370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:18,376-Speed 11181.99 samples/sec Loss 5.8320 LearningRate 0.0164 Epoch: 23 Global Step: 135380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:19,327-Speed 10768.94 samples/sec Loss 5.8167 LearningRate 0.0164 Epoch: 23 Global Step: 135390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:20,271-Speed 10852.98 samples/sec Loss 5.7944 LearningRate 0.0164 Epoch: 23 Global Step: 135400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:21,186-Speed 11200.87 samples/sec Loss 5.8384 LearningRate 0.0164 Epoch: 23 Global Step: 135410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:22,087-Speed 11377.53 samples/sec Loss 5.7793 LearningRate 0.0164 Epoch: 23 Global Step: 135420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:23,001-Speed 11205.24 samples/sec Loss 5.9260 LearningRate 0.0164 Epoch: 23 Global Step: 135430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:23,909-Speed 11293.78 samples/sec Loss 5.7025 LearningRate 0.0164 Epoch: 23 Global Step: 135440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:24,850-Speed 10883.99 samples/sec Loss 5.9582 LearningRate 0.0164 Epoch: 23 Global Step: 135450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:25,795-Speed 10849.70 samples/sec Loss 5.9612 LearningRate 0.0164 Epoch: 23 Global Step: 135460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:26,721-Speed 11071.19 samples/sec Loss 5.8078 LearningRate 0.0164 Epoch: 23 Global Step: 135470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:27,627-Speed 11307.85 samples/sec Loss 5.8498 LearningRate 0.0163 Epoch: 23 Global Step: 135480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:28,550-Speed 11104.40 samples/sec Loss 5.6931 LearningRate 0.0163 Epoch: 23 Global Step: 135490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:29,454-Speed 11330.03 samples/sec Loss 5.8252 LearningRate 0.0163 Epoch: 23 Global Step: 135500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:30,381-Speed 11054.13 samples/sec Loss 5.8538 LearningRate 0.0163 Epoch: 23 Global Step: 135510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:31,301-Speed 11141.86 samples/sec Loss 5.8329 LearningRate 0.0163 Epoch: 23 Global Step: 135520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:32,205-Speed 11332.98 samples/sec Loss 5.7986 LearningRate 0.0163 Epoch: 23 Global Step: 135530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:33,093-Speed 11543.38 samples/sec Loss 5.8053 LearningRate 0.0163 Epoch: 23 Global Step: 135540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:33,970-Speed 11674.85 samples/sec Loss 5.8023 LearningRate 0.0163 Epoch: 23 Global Step: 135550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:34,871-Speed 11372.55 samples/sec Loss 5.6509 LearningRate 0.0163 Epoch: 23 Global Step: 135560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:02:35,762-Speed 11504.14 samples/sec Loss 5.9409 LearningRate 0.0163 Epoch: 23 Global Step: 135570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:36,692-Speed 11018.50 samples/sec Loss 5.6984 LearningRate 0.0163 Epoch: 23 Global Step: 135580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:37,633-Speed 10888.31 samples/sec Loss 5.8921 LearningRate 0.0163 Epoch: 23 Global Step: 135590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:38,571-Speed 10927.69 samples/sec Loss 5.5959 LearningRate 0.0163 Epoch: 23 Global Step: 135600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:39,510-Speed 10913.35 samples/sec Loss 5.7932 LearningRate 0.0163 Epoch: 23 Global Step: 135610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:40,490-Speed 10449.85 samples/sec Loss 5.7981 LearningRate 0.0163 Epoch: 23 Global Step: 135620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:41,440-Speed 10796.66 samples/sec Loss 5.8078 LearningRate 0.0163 Epoch: 23 Global Step: 135630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:42,403-Speed 10640.76 samples/sec Loss 5.9446 LearningRate 0.0163 Epoch: 23 Global Step: 135640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:43,343-Speed 10905.13 samples/sec Loss 5.8013 LearningRate 0.0163 Epoch: 23 Global Step: 135650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:44,219-Speed 11699.58 samples/sec Loss 5.8741 LearningRate 0.0163 Epoch: 23 Global Step: 135660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:45,119-Speed 11379.73 samples/sec Loss 5.6759 LearningRate 0.0163 Epoch: 23 Global Step: 135670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:46,032-Speed 11219.47 samples/sec Loss 5.8919 LearningRate 0.0163 Epoch: 23 Global Step: 135680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:46,922-Speed 11520.33 samples/sec Loss 5.8233 LearningRate 0.0163 Epoch: 23 Global Step: 135690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:47,850-Speed 11057.45 samples/sec Loss 5.8801 LearningRate 0.0163 Epoch: 23 Global Step: 135700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:48,790-Speed 10903.81 samples/sec Loss 5.7729 LearningRate 0.0163 Epoch: 23 Global Step: 135710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:49,689-Speed 11402.82 samples/sec Loss 5.8559 LearningRate 0.0163 Epoch: 23 Global Step: 135720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:50,614-Speed 11078.74 samples/sec Loss 5.8741 LearningRate 0.0163 Epoch: 23 Global Step: 135730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:51,530-Speed 11183.06 samples/sec Loss 5.9465 LearningRate 0.0163 Epoch: 23 Global Step: 135740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:52,473-Speed 10863.32 samples/sec Loss 5.8069 LearningRate 0.0163 Epoch: 23 Global Step: 135750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:53,403-Speed 11020.76 samples/sec Loss 5.7288 LearningRate 0.0162 Epoch: 23 Global Step: 135760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:02:54,298-Speed 11452.59 samples/sec Loss 5.9131 LearningRate 0.0162 Epoch: 23 Global Step: 135770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:55,253-Speed 10730.21 samples/sec Loss 5.8897 LearningRate 0.0162 Epoch: 23 Global Step: 135780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:56,156-Speed 11339.47 samples/sec Loss 5.8733 LearningRate 0.0162 Epoch: 23 Global Step: 135790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:57,057-Speed 11372.32 samples/sec Loss 5.7921 LearningRate 0.0162 Epoch: 23 Global Step: 135800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:57,970-Speed 11228.25 samples/sec Loss 5.7705 LearningRate 0.0162 Epoch: 23 Global Step: 135810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:58,891-Speed 11130.73 samples/sec Loss 5.7103 LearningRate 0.0162 Epoch: 23 Global Step: 135820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:02:59,816-Speed 11081.03 samples/sec Loss 5.7395 LearningRate 0.0162 Epoch: 23 Global Step: 135830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:00,701-Speed 11572.38 samples/sec Loss 5.7567 LearningRate 0.0162 Epoch: 23 Global Step: 135840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:01,685-Speed 10422.82 samples/sec Loss 5.8568 LearningRate 0.0162 Epoch: 23 Global Step: 135850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:02,609-Speed 11086.83 samples/sec Loss 5.7103 LearningRate 0.0162 Epoch: 23 Global Step: 135860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:03,513-Speed 11334.84 samples/sec Loss 5.8547 LearningRate 0.0162 Epoch: 23 Global Step: 135870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:03:04,442-Speed 11029.40 samples/sec Loss 5.7989 LearningRate 0.0162 Epoch: 23 Global Step: 135880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:03:05,367-Speed 11077.94 samples/sec Loss 5.8085 LearningRate 0.0162 Epoch: 23 Global Step: 135890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:03:06,248-Speed 11633.45 samples/sec Loss 5.8736 LearningRate 0.0162 Epoch: 23 Global Step: 135900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:07,213-Speed 10615.81 samples/sec Loss 5.8790 LearningRate 0.0162 Epoch: 23 Global Step: 135910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:08,150-Speed 10934.84 samples/sec Loss 5.8628 LearningRate 0.0162 Epoch: 23 Global Step: 135920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:09,062-Speed 11233.54 samples/sec Loss 5.8236 LearningRate 0.0162 Epoch: 23 Global Step: 135930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:09,948-Speed 11565.93 samples/sec Loss 5.8033 LearningRate 0.0162 Epoch: 23 Global Step: 135940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:10,856-Speed 11290.09 samples/sec Loss 5.8488 LearningRate 0.0162 Epoch: 23 Global Step: 135950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:11,769-Speed 11220.83 samples/sec Loss 5.7768 LearningRate 0.0162 Epoch: 23 Global Step: 135960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:12,699-Speed 11016.29 samples/sec Loss 5.9793 LearningRate 0.0162 Epoch: 23 Global Step: 135970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:13,618-Speed 11152.24 samples/sec Loss 5.8904 LearningRate 0.0162 Epoch: 23 Global Step: 135980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:14,533-Speed 11198.67 samples/sec Loss 5.9042 LearningRate 0.0162 Epoch: 23 Global Step: 135990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:03:15,447-Speed 11209.67 samples/sec Loss 5.8947 LearningRate 0.0162 Epoch: 23 Global Step: 136000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:03:36,724-[lfw][136000]XNorm: 9.361742 Training: 2022-04-11 04:03:36,724-[lfw][136000]Accuracy-Flip: 0.99667+-0.00307 Training: 2022-04-11 04:03:36,725-[lfw][136000]Accuracy-Highest: 0.99667 Training: 2022-04-11 04:04:01,279-[cfp_fp][136000]XNorm: 7.932015 Training: 2022-04-11 04:04:01,280-[cfp_fp][136000]Accuracy-Flip: 0.94729+-0.01174 Training: 2022-04-11 04:04:01,280-[cfp_fp][136000]Accuracy-Highest: 0.95171 Training: 2022-04-11 04:04:22,497-[agedb_30][136000]XNorm: 9.178503 Training: 2022-04-11 04:04:22,498-[agedb_30][136000]Accuracy-Flip: 0.96467+-0.00963 Training: 2022-04-11 04:04:22,498-[agedb_30][136000]Accuracy-Highest: 0.96783 Training: 2022-04-11 04:04:23,408-Speed 150.68 samples/sec Loss 5.8353 LearningRate 0.0162 Epoch: 23 Global Step: 136010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:24,334-Speed 11091.76 samples/sec Loss 5.7832 LearningRate 0.0162 Epoch: 23 Global Step: 136020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:25,220-Speed 11577.08 samples/sec Loss 5.8698 LearningRate 0.0162 Epoch: 23 Global Step: 136030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:26,111-Speed 11504.11 samples/sec Loss 5.8612 LearningRate 0.0161 Epoch: 23 Global Step: 136040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:27,008-Speed 11415.98 samples/sec Loss 5.8551 LearningRate 0.0161 Epoch: 23 Global Step: 136050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:27,918-Speed 11258.69 samples/sec Loss 5.7795 LearningRate 0.0161 Epoch: 23 Global Step: 136060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:28,824-Speed 11311.30 samples/sec Loss 5.6810 LearningRate 0.0161 Epoch: 23 Global Step: 136070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:29,734-Speed 11264.77 samples/sec Loss 5.9518 LearningRate 0.0161 Epoch: 23 Global Step: 136080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:30,621-Speed 11557.03 samples/sec Loss 5.7602 LearningRate 0.0161 Epoch: 23 Global Step: 136090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:31,538-Speed 11183.13 samples/sec Loss 5.8431 LearningRate 0.0161 Epoch: 23 Global Step: 136100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:32,424-Speed 11565.77 samples/sec Loss 5.8963 LearningRate 0.0161 Epoch: 23 Global Step: 136110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:04:33,314-Speed 11511.86 samples/sec Loss 5.7587 LearningRate 0.0161 Epoch: 23 Global Step: 136120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:04:34,187-Speed 11747.49 samples/sec Loss 5.7383 LearningRate 0.0161 Epoch: 23 Global Step: 136130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:35,078-Speed 11491.66 samples/sec Loss 5.7656 LearningRate 0.0161 Epoch: 23 Global Step: 136140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:35,986-Speed 11290.43 samples/sec Loss 5.7117 LearningRate 0.0161 Epoch: 23 Global Step: 136150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:36,901-Speed 11199.05 samples/sec Loss 5.7513 LearningRate 0.0161 Epoch: 23 Global Step: 136160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:37,826-Speed 11074.88 samples/sec Loss 5.6910 LearningRate 0.0161 Epoch: 23 Global Step: 136170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:38,740-Speed 11215.66 samples/sec Loss 5.7771 LearningRate 0.0161 Epoch: 23 Global Step: 136180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:39,634-Speed 11450.94 samples/sec Loss 5.7223 LearningRate 0.0161 Epoch: 23 Global Step: 136190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:40,538-Speed 11345.38 samples/sec Loss 5.8782 LearningRate 0.0161 Epoch: 23 Global Step: 136200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:41,463-Speed 11073.62 samples/sec Loss 5.7633 LearningRate 0.0161 Epoch: 23 Global Step: 136210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:42,375-Speed 11245.70 samples/sec Loss 5.8471 LearningRate 0.0161 Epoch: 23 Global Step: 136220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:04:43,313-Speed 10919.29 samples/sec Loss 5.8025 LearningRate 0.0161 Epoch: 23 Global Step: 136230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:44,224-Speed 11253.01 samples/sec Loss 5.8121 LearningRate 0.0161 Epoch: 23 Global Step: 136240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:45,109-Speed 11581.63 samples/sec Loss 5.7684 LearningRate 0.0161 Epoch: 23 Global Step: 136250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:45,996-Speed 11543.84 samples/sec Loss 5.8126 LearningRate 0.0161 Epoch: 23 Global Step: 136260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:46,939-Speed 10872.25 samples/sec Loss 5.8504 LearningRate 0.0161 Epoch: 23 Global Step: 136270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:47,842-Speed 11342.41 samples/sec Loss 5.7372 LearningRate 0.0161 Epoch: 23 Global Step: 136280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:48,748-Speed 11320.10 samples/sec Loss 5.6853 LearningRate 0.0161 Epoch: 23 Global Step: 136290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:49,662-Speed 11200.89 samples/sec Loss 5.7158 LearningRate 0.0161 Epoch: 23 Global Step: 136300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:50,577-Speed 11198.98 samples/sec Loss 5.8188 LearningRate 0.0161 Epoch: 23 Global Step: 136310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:51,496-Speed 11157.23 samples/sec Loss 5.7840 LearningRate 0.0161 Epoch: 23 Global Step: 136320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:52,401-Speed 11318.38 samples/sec Loss 5.8020 LearningRate 0.0160 Epoch: 23 Global Step: 136330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:04:53,338-Speed 10942.53 samples/sec Loss 5.9575 LearningRate 0.0160 Epoch: 23 Global Step: 136340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:04:54,262-Speed 11089.81 samples/sec Loss 5.8828 LearningRate 0.0160 Epoch: 23 Global Step: 136350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:55,171-Speed 11276.30 samples/sec Loss 5.8074 LearningRate 0.0160 Epoch: 23 Global Step: 136360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:56,105-Speed 10961.77 samples/sec Loss 5.8354 LearningRate 0.0160 Epoch: 23 Global Step: 136370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:57,002-Speed 11433.42 samples/sec Loss 5.8840 LearningRate 0.0160 Epoch: 23 Global Step: 136380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:57,898-Speed 11432.85 samples/sec Loss 5.7884 LearningRate 0.0160 Epoch: 23 Global Step: 136390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:58,799-Speed 11373.24 samples/sec Loss 5.7489 LearningRate 0.0160 Epoch: 23 Global Step: 136400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:04:59,715-Speed 11182.77 samples/sec Loss 5.8083 LearningRate 0.0160 Epoch: 23 Global Step: 136410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:00,639-Speed 11092.74 samples/sec Loss 5.8488 LearningRate 0.0160 Epoch: 23 Global Step: 136420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:01,566-Speed 11055.20 samples/sec Loss 5.7373 LearningRate 0.0160 Epoch: 23 Global Step: 136430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:02,480-Speed 11212.68 samples/sec Loss 5.9964 LearningRate 0.0160 Epoch: 23 Global Step: 136440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:03,434-Speed 10743.38 samples/sec Loss 5.8237 LearningRate 0.0160 Epoch: 23 Global Step: 136450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:05:04,473-Speed 9861.28 samples/sec Loss 5.9154 LearningRate 0.0160 Epoch: 23 Global Step: 136460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:05:15,068-Speed 966.52 samples/sec Loss 5.3262 LearningRate 0.0160 Epoch: 24 Global Step: 136470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:16,133-Speed 9623.95 samples/sec Loss 5.2039 LearningRate 0.0160 Epoch: 24 Global Step: 136480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:17,201-Speed 9600.97 samples/sec Loss 5.2761 LearningRate 0.0160 Epoch: 24 Global Step: 136490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:18,160-Speed 10678.06 samples/sec Loss 5.1578 LearningRate 0.0160 Epoch: 24 Global Step: 136500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:19,118-Speed 10695.58 samples/sec Loss 5.1010 LearningRate 0.0160 Epoch: 24 Global Step: 136510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:20,099-Speed 10452.20 samples/sec Loss 5.0863 LearningRate 0.0160 Epoch: 24 Global Step: 136520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:21,008-Speed 11262.59 samples/sec Loss 5.1844 LearningRate 0.0160 Epoch: 24 Global Step: 136530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:21,926-Speed 11164.82 samples/sec Loss 5.1069 LearningRate 0.0160 Epoch: 24 Global Step: 136540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:22,829-Speed 11348.77 samples/sec Loss 5.1202 LearningRate 0.0160 Epoch: 24 Global Step: 136550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:23,724-Speed 11452.19 samples/sec Loss 5.1591 LearningRate 0.0160 Epoch: 24 Global Step: 136560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:24,627-Speed 11348.33 samples/sec Loss 5.0935 LearningRate 0.0160 Epoch: 24 Global Step: 136570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:05:25,556-Speed 11024.99 samples/sec Loss 5.2130 LearningRate 0.0160 Epoch: 24 Global Step: 136580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:05:26,483-Speed 11060.67 samples/sec Loss 5.0712 LearningRate 0.0160 Epoch: 24 Global Step: 136590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:05:27,410-Speed 11056.19 samples/sec Loss 5.2002 LearningRate 0.0160 Epoch: 24 Global Step: 136600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:05:28,362-Speed 10758.00 samples/sec Loss 5.2001 LearningRate 0.0159 Epoch: 24 Global Step: 136610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:29,334-Speed 10542.78 samples/sec Loss 5.2471 LearningRate 0.0159 Epoch: 24 Global Step: 136620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:30,258-Speed 11086.34 samples/sec Loss 5.1767 LearningRate 0.0159 Epoch: 24 Global Step: 136630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:31,175-Speed 11181.49 samples/sec Loss 5.1501 LearningRate 0.0159 Epoch: 24 Global Step: 136640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:32,145-Speed 10557.91 samples/sec Loss 5.2373 LearningRate 0.0159 Epoch: 24 Global Step: 136650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:33,045-Speed 11390.77 samples/sec Loss 5.2239 LearningRate 0.0159 Epoch: 24 Global Step: 136660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:33,947-Speed 11355.37 samples/sec Loss 5.2324 LearningRate 0.0159 Epoch: 24 Global Step: 136670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:34,837-Speed 11516.21 samples/sec Loss 5.2034 LearningRate 0.0159 Epoch: 24 Global Step: 136680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:35,750-Speed 11223.61 samples/sec Loss 5.1828 LearningRate 0.0159 Epoch: 24 Global Step: 136690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:36,682-Speed 10998.21 samples/sec Loss 5.1278 LearningRate 0.0159 Epoch: 24 Global Step: 136700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:37,627-Speed 10841.24 samples/sec Loss 5.1916 LearningRate 0.0159 Epoch: 24 Global Step: 136710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:38,526-Speed 11406.75 samples/sec Loss 5.1978 LearningRate 0.0159 Epoch: 24 Global Step: 136720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:39,426-Speed 11383.20 samples/sec Loss 5.2985 LearningRate 0.0159 Epoch: 24 Global Step: 136730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:40,358-Speed 10993.06 samples/sec Loss 5.2953 LearningRate 0.0159 Epoch: 24 Global Step: 136740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:41,257-Speed 11394.32 samples/sec Loss 5.1403 LearningRate 0.0159 Epoch: 24 Global Step: 136750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:42,153-Speed 11448.04 samples/sec Loss 5.1198 LearningRate 0.0159 Epoch: 24 Global Step: 136760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:43,104-Speed 10770.76 samples/sec Loss 5.3059 LearningRate 0.0159 Epoch: 24 Global Step: 136770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:44,048-Speed 10858.25 samples/sec Loss 5.1138 LearningRate 0.0159 Epoch: 24 Global Step: 136780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:44,947-Speed 11398.11 samples/sec Loss 5.3417 LearningRate 0.0159 Epoch: 24 Global Step: 136790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:45,844-Speed 11418.47 samples/sec Loss 5.3430 LearningRate 0.0159 Epoch: 24 Global Step: 136800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:46,754-Speed 11270.92 samples/sec Loss 5.3891 LearningRate 0.0159 Epoch: 24 Global Step: 136810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:47,653-Speed 11388.77 samples/sec Loss 5.3027 LearningRate 0.0159 Epoch: 24 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:48,564-Speed 11249.44 samples/sec Loss 5.4443 LearningRate 0.0159 Epoch: 24 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:49,493-Speed 11026.68 samples/sec Loss 5.3387 LearningRate 0.0159 Epoch: 24 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:05:50,438-Speed 10849.36 samples/sec Loss 5.3459 LearningRate 0.0159 Epoch: 24 Global Step: 136850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:51,343-Speed 11317.09 samples/sec Loss 5.2260 LearningRate 0.0159 Epoch: 24 Global Step: 136860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:52,239-Speed 11438.68 samples/sec Loss 5.3238 LearningRate 0.0159 Epoch: 24 Global Step: 136870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:53,147-Speed 11285.40 samples/sec Loss 5.3637 LearningRate 0.0159 Epoch: 24 Global Step: 136880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:54,095-Speed 10815.66 samples/sec Loss 5.3825 LearningRate 0.0159 Epoch: 24 Global Step: 136890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:55,046-Speed 10768.63 samples/sec Loss 5.2617 LearningRate 0.0158 Epoch: 24 Global Step: 136900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:55,987-Speed 10897.70 samples/sec Loss 5.3464 LearningRate 0.0158 Epoch: 24 Global Step: 136910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:56,933-Speed 10836.67 samples/sec Loss 5.3470 LearningRate 0.0158 Epoch: 24 Global Step: 136920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:57,827-Speed 11451.27 samples/sec Loss 5.1419 LearningRate 0.0158 Epoch: 24 Global Step: 136930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:58,744-Speed 11173.46 samples/sec Loss 5.2287 LearningRate 0.0158 Epoch: 24 Global Step: 136940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:05:59,638-Speed 11462.73 samples/sec Loss 5.3188 LearningRate 0.0158 Epoch: 24 Global Step: 136950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:00,543-Speed 11324.32 samples/sec Loss 5.2873 LearningRate 0.0158 Epoch: 24 Global Step: 136960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:01,458-Speed 11203.18 samples/sec Loss 5.2613 LearningRate 0.0158 Epoch: 24 Global Step: 136970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:02,383-Speed 11078.75 samples/sec Loss 5.2656 LearningRate 0.0158 Epoch: 24 Global Step: 136980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:03,297-Speed 11210.02 samples/sec Loss 5.1905 LearningRate 0.0158 Epoch: 24 Global Step: 136990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:04,264-Speed 10593.10 samples/sec Loss 5.1787 LearningRate 0.0158 Epoch: 24 Global Step: 137000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:05,152-Speed 11543.27 samples/sec Loss 5.4083 LearningRate 0.0158 Epoch: 24 Global Step: 137010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:06,061-Speed 11269.45 samples/sec Loss 5.3430 LearningRate 0.0158 Epoch: 24 Global Step: 137020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:07,047-Speed 10392.21 samples/sec Loss 5.3336 LearningRate 0.0158 Epoch: 24 Global Step: 137030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:08,019-Speed 10546.55 samples/sec Loss 5.4308 LearningRate 0.0158 Epoch: 24 Global Step: 137040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:08,971-Speed 10765.68 samples/sec Loss 5.2135 LearningRate 0.0158 Epoch: 24 Global Step: 137050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:09,891-Speed 11135.57 samples/sec Loss 5.4116 LearningRate 0.0158 Epoch: 24 Global Step: 137060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:10,859-Speed 10586.22 samples/sec Loss 5.3338 LearningRate 0.0158 Epoch: 24 Global Step: 137070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:11,758-Speed 11400.64 samples/sec Loss 5.3379 LearningRate 0.0158 Epoch: 24 Global Step: 137080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:12,721-Speed 10635.10 samples/sec Loss 5.4226 LearningRate 0.0158 Epoch: 24 Global Step: 137090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:13,691-Speed 10558.20 samples/sec Loss 5.5267 LearningRate 0.0158 Epoch: 24 Global Step: 137100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:14,814-Speed 9129.48 samples/sec Loss 5.2576 LearningRate 0.0158 Epoch: 24 Global Step: 137110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:15,746-Speed 10985.15 samples/sec Loss 5.4175 LearningRate 0.0158 Epoch: 24 Global Step: 137120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:16,679-Speed 10987.76 samples/sec Loss 5.3152 LearningRate 0.0158 Epoch: 24 Global Step: 137130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:17,593-Speed 11215.55 samples/sec Loss 5.3595 LearningRate 0.0158 Epoch: 24 Global Step: 137140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:18,533-Speed 10898.53 samples/sec Loss 5.3787 LearningRate 0.0158 Epoch: 24 Global Step: 137150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:19,479-Speed 10843.36 samples/sec Loss 5.3827 LearningRate 0.0158 Epoch: 24 Global Step: 137160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:06:20,393-Speed 11211.45 samples/sec Loss 5.3501 LearningRate 0.0158 Epoch: 24 Global Step: 137170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:06:21,279-Speed 11562.61 samples/sec Loss 5.3769 LearningRate 0.0157 Epoch: 24 Global Step: 137180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:22,194-Speed 11204.33 samples/sec Loss 5.4970 LearningRate 0.0157 Epoch: 24 Global Step: 137190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:23,076-Speed 11619.12 samples/sec Loss 5.3874 LearningRate 0.0157 Epoch: 24 Global Step: 137200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:23,994-Speed 11156.72 samples/sec Loss 5.3154 LearningRate 0.0157 Epoch: 24 Global Step: 137210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:24,926-Speed 10993.86 samples/sec Loss 5.4625 LearningRate 0.0157 Epoch: 24 Global Step: 137220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:25,822-Speed 11441.80 samples/sec Loss 5.3147 LearningRate 0.0157 Epoch: 24 Global Step: 137230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:26,750-Speed 11036.15 samples/sec Loss 5.4176 LearningRate 0.0157 Epoch: 24 Global Step: 137240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:27,665-Speed 11199.29 samples/sec Loss 5.2649 LearningRate 0.0157 Epoch: 24 Global Step: 137250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:28,564-Speed 11405.34 samples/sec Loss 5.3165 LearningRate 0.0157 Epoch: 24 Global Step: 137260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:29,466-Speed 11356.55 samples/sec Loss 5.3228 LearningRate 0.0157 Epoch: 24 Global Step: 137270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:30,367-Speed 11374.26 samples/sec Loss 5.4057 LearningRate 0.0157 Epoch: 24 Global Step: 137280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:31,281-Speed 11209.69 samples/sec Loss 5.4908 LearningRate 0.0157 Epoch: 24 Global Step: 137290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:32,215-Speed 10976.06 samples/sec Loss 5.3427 LearningRate 0.0157 Epoch: 24 Global Step: 137300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:33,102-Speed 11554.86 samples/sec Loss 5.3513 LearningRate 0.0157 Epoch: 24 Global Step: 137310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:33,984-Speed 11606.54 samples/sec Loss 5.5108 LearningRate 0.0157 Epoch: 24 Global Step: 137320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:34,878-Speed 11473.02 samples/sec Loss 5.4330 LearningRate 0.0157 Epoch: 24 Global Step: 137330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:35,813-Speed 10950.61 samples/sec Loss 5.3223 LearningRate 0.0157 Epoch: 24 Global Step: 137340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:36,695-Speed 11619.20 samples/sec Loss 5.3996 LearningRate 0.0157 Epoch: 24 Global Step: 137350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:37,650-Speed 10734.11 samples/sec Loss 5.4630 LearningRate 0.0157 Epoch: 24 Global Step: 137360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:38,590-Speed 10904.14 samples/sec Loss 5.3633 LearningRate 0.0157 Epoch: 24 Global Step: 137370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:39,520-Speed 11013.64 samples/sec Loss 5.4784 LearningRate 0.0157 Epoch: 24 Global Step: 137380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:40,428-Speed 11298.07 samples/sec Loss 5.4653 LearningRate 0.0157 Epoch: 24 Global Step: 137390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:41,322-Speed 11469.78 samples/sec Loss 5.4720 LearningRate 0.0157 Epoch: 24 Global Step: 137400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:06:42,211-Speed 11528.72 samples/sec Loss 5.3268 LearningRate 0.0157 Epoch: 24 Global Step: 137410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:43,152-Speed 10889.87 samples/sec Loss 5.4472 LearningRate 0.0157 Epoch: 24 Global Step: 137420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:44,051-Speed 11396.97 samples/sec Loss 5.4954 LearningRate 0.0157 Epoch: 24 Global Step: 137430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:44,947-Speed 11437.93 samples/sec Loss 5.5076 LearningRate 0.0157 Epoch: 24 Global Step: 137440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:45,822-Speed 11705.80 samples/sec Loss 5.6092 LearningRate 0.0157 Epoch: 24 Global Step: 137450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:46,701-Speed 11668.88 samples/sec Loss 5.4592 LearningRate 0.0157 Epoch: 24 Global Step: 137460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:47,585-Speed 11587.63 samples/sec Loss 5.4455 LearningRate 0.0156 Epoch: 24 Global Step: 137470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:48,486-Speed 11374.67 samples/sec Loss 5.4613 LearningRate 0.0156 Epoch: 24 Global Step: 137480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:49,465-Speed 10467.70 samples/sec Loss 5.4052 LearningRate 0.0156 Epoch: 24 Global Step: 137490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:50,453-Speed 10370.52 samples/sec Loss 5.4836 LearningRate 0.0156 Epoch: 24 Global Step: 137500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:51,371-Speed 11156.18 samples/sec Loss 5.5574 LearningRate 0.0156 Epoch: 24 Global Step: 137510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:06:52,320-Speed 10800.59 samples/sec Loss 5.5377 LearningRate 0.0156 Epoch: 24 Global Step: 137520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:53,221-Speed 11371.16 samples/sec Loss 5.3259 LearningRate 0.0156 Epoch: 24 Global Step: 137530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:54,138-Speed 11174.11 samples/sec Loss 5.3171 LearningRate 0.0156 Epoch: 24 Global Step: 137540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:55,042-Speed 11345.24 samples/sec Loss 5.3457 LearningRate 0.0156 Epoch: 24 Global Step: 137550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:55,952-Speed 11261.29 samples/sec Loss 5.4873 LearningRate 0.0156 Epoch: 24 Global Step: 137560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:56,844-Speed 11503.42 samples/sec Loss 5.4607 LearningRate 0.0156 Epoch: 24 Global Step: 137570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:57,748-Speed 11335.24 samples/sec Loss 5.5326 LearningRate 0.0156 Epoch: 24 Global Step: 137580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:58,644-Speed 11433.98 samples/sec Loss 5.4178 LearningRate 0.0156 Epoch: 24 Global Step: 137590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:06:59,527-Speed 11601.48 samples/sec Loss 5.4043 LearningRate 0.0156 Epoch: 24 Global Step: 137600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:00,430-Speed 11356.75 samples/sec Loss 5.5188 LearningRate 0.0156 Epoch: 24 Global Step: 137610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:01,346-Speed 11179.79 samples/sec Loss 5.4479 LearningRate 0.0156 Epoch: 24 Global Step: 137620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:02,282-Speed 10948.63 samples/sec Loss 5.5355 LearningRate 0.0156 Epoch: 24 Global Step: 137630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:03,197-Speed 11206.85 samples/sec Loss 5.3731 LearningRate 0.0156 Epoch: 24 Global Step: 137640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:04,112-Speed 11198.75 samples/sec Loss 5.5788 LearningRate 0.0156 Epoch: 24 Global Step: 137650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:04,995-Speed 11597.98 samples/sec Loss 5.5624 LearningRate 0.0156 Epoch: 24 Global Step: 137660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:05,916-Speed 11127.88 samples/sec Loss 5.4064 LearningRate 0.0156 Epoch: 24 Global Step: 137670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:06,865-Speed 10794.24 samples/sec Loss 5.4287 LearningRate 0.0156 Epoch: 24 Global Step: 137680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:07,811-Speed 10840.00 samples/sec Loss 5.5667 LearningRate 0.0156 Epoch: 24 Global Step: 137690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:08,741-Speed 11024.87 samples/sec Loss 5.5338 LearningRate 0.0156 Epoch: 24 Global Step: 137700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:09,620-Speed 11648.24 samples/sec Loss 5.5322 LearningRate 0.0156 Epoch: 24 Global Step: 137710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:10,506-Speed 11573.17 samples/sec Loss 5.3564 LearningRate 0.0156 Epoch: 24 Global Step: 137720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:11,400-Speed 11460.11 samples/sec Loss 5.4655 LearningRate 0.0156 Epoch: 24 Global Step: 137730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:12,314-Speed 11209.21 samples/sec Loss 5.6025 LearningRate 0.0156 Epoch: 24 Global Step: 137740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:13,234-Speed 11139.69 samples/sec Loss 5.4618 LearningRate 0.0156 Epoch: 24 Global Step: 137750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:14,139-Speed 11319.96 samples/sec Loss 5.4873 LearningRate 0.0155 Epoch: 24 Global Step: 137760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:15,058-Speed 11159.08 samples/sec Loss 5.5055 LearningRate 0.0155 Epoch: 24 Global Step: 137770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:15,991-Speed 10974.63 samples/sec Loss 5.5150 LearningRate 0.0155 Epoch: 24 Global Step: 137780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:16,890-Speed 11407.94 samples/sec Loss 5.4858 LearningRate 0.0155 Epoch: 24 Global Step: 137790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:17,804-Speed 11204.99 samples/sec Loss 5.4152 LearningRate 0.0155 Epoch: 24 Global Step: 137800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:18,704-Speed 11383.40 samples/sec Loss 5.5150 LearningRate 0.0155 Epoch: 24 Global Step: 137810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:19,623-Speed 11154.60 samples/sec Loss 5.4731 LearningRate 0.0155 Epoch: 24 Global Step: 137820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:20,523-Speed 11383.99 samples/sec Loss 5.5310 LearningRate 0.0155 Epoch: 24 Global Step: 137830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:07:21,415-Speed 11492.04 samples/sec Loss 5.5061 LearningRate 0.0155 Epoch: 24 Global Step: 137840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:22,297-Speed 11613.93 samples/sec Loss 5.5563 LearningRate 0.0155 Epoch: 24 Global Step: 137850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:23,173-Speed 11696.29 samples/sec Loss 5.5503 LearningRate 0.0155 Epoch: 24 Global Step: 137860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:24,067-Speed 11470.02 samples/sec Loss 5.4834 LearningRate 0.0155 Epoch: 24 Global Step: 137870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:24,979-Speed 11238.45 samples/sec Loss 5.6206 LearningRate 0.0155 Epoch: 24 Global Step: 137880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:25,867-Speed 11530.70 samples/sec Loss 5.5052 LearningRate 0.0155 Epoch: 24 Global Step: 137890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:26,795-Speed 11052.67 samples/sec Loss 5.5418 LearningRate 0.0155 Epoch: 24 Global Step: 137900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:27,736-Speed 10888.30 samples/sec Loss 5.5476 LearningRate 0.0155 Epoch: 24 Global Step: 137910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:28,654-Speed 11155.14 samples/sec Loss 5.4189 LearningRate 0.0155 Epoch: 24 Global Step: 137920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:29,588-Speed 10976.24 samples/sec Loss 5.4583 LearningRate 0.0155 Epoch: 24 Global Step: 137930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:30,478-Speed 11511.89 samples/sec Loss 5.4319 LearningRate 0.0155 Epoch: 24 Global Step: 137940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:31,401-Speed 11098.03 samples/sec Loss 5.4685 LearningRate 0.0155 Epoch: 24 Global Step: 137950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:32,309-Speed 11292.63 samples/sec Loss 5.6067 LearningRate 0.0155 Epoch: 24 Global Step: 137960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:33,194-Speed 11570.22 samples/sec Loss 5.5266 LearningRate 0.0155 Epoch: 24 Global Step: 137970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:07:34,099-Speed 11324.09 samples/sec Loss 5.4198 LearningRate 0.0155 Epoch: 24 Global Step: 137980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:35,009-Speed 11262.84 samples/sec Loss 5.5317 LearningRate 0.0155 Epoch: 24 Global Step: 137990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:35,916-Speed 11297.02 samples/sec Loss 5.4162 LearningRate 0.0155 Epoch: 24 Global Step: 138000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:07:57,171-[lfw][138000]XNorm: 9.584324 Training: 2022-04-11 04:07:57,172-[lfw][138000]Accuracy-Flip: 0.99717+-0.00289 Training: 2022-04-11 04:07:57,172-[lfw][138000]Accuracy-Highest: 0.99717 Training: 2022-04-11 04:08:21,647-[cfp_fp][138000]XNorm: 8.090722 Training: 2022-04-11 04:08:21,647-[cfp_fp][138000]Accuracy-Flip: 0.94586+-0.01017 Training: 2022-04-11 04:08:21,647-[cfp_fp][138000]Accuracy-Highest: 0.95171 Training: 2022-04-11 04:08:42,753-[agedb_30][138000]XNorm: 9.350841 Training: 2022-04-11 04:08:42,754-[agedb_30][138000]Accuracy-Flip: 0.96800+-0.00852 Training: 2022-04-11 04:08:42,754-[agedb_30][138000]Accuracy-Highest: 0.96800 Training: 2022-04-11 04:08:43,657-Speed 151.16 samples/sec Loss 5.6811 LearningRate 0.0155 Epoch: 24 Global Step: 138010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:44,562-Speed 11327.67 samples/sec Loss 5.5789 LearningRate 0.0155 Epoch: 24 Global Step: 138020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:45,448-Speed 11560.10 samples/sec Loss 5.6055 LearningRate 0.0155 Epoch: 24 Global Step: 138030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:46,333-Speed 11578.02 samples/sec Loss 5.5175 LearningRate 0.0155 Epoch: 24 Global Step: 138040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:47,257-Speed 11094.06 samples/sec Loss 5.6445 LearningRate 0.0154 Epoch: 24 Global Step: 138050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:48,176-Speed 11147.07 samples/sec Loss 5.3659 LearningRate 0.0154 Epoch: 24 Global Step: 138060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:49,108-Speed 10994.20 samples/sec Loss 5.5895 LearningRate 0.0154 Epoch: 24 Global Step: 138070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:08:50,063-Speed 10734.84 samples/sec Loss 5.5673 LearningRate 0.0154 Epoch: 24 Global Step: 138080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:50,954-Speed 11488.89 samples/sec Loss 5.5444 LearningRate 0.0154 Epoch: 24 Global Step: 138090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:51,895-Speed 10900.82 samples/sec Loss 5.5279 LearningRate 0.0154 Epoch: 24 Global Step: 138100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:52,858-Speed 10639.53 samples/sec Loss 5.4357 LearningRate 0.0154 Epoch: 24 Global Step: 138110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:53,792-Speed 10962.60 samples/sec Loss 5.3223 LearningRate 0.0154 Epoch: 24 Global Step: 138120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:54,743-Speed 10778.89 samples/sec Loss 5.6507 LearningRate 0.0154 Epoch: 24 Global Step: 138130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:55,664-Speed 11126.60 samples/sec Loss 5.6008 LearningRate 0.0154 Epoch: 24 Global Step: 138140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:56,575-Speed 11250.54 samples/sec Loss 5.5448 LearningRate 0.0154 Epoch: 24 Global Step: 138150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:57,453-Speed 11675.18 samples/sec Loss 5.6001 LearningRate 0.0154 Epoch: 24 Global Step: 138160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:58,358-Speed 11332.71 samples/sec Loss 5.4631 LearningRate 0.0154 Epoch: 24 Global Step: 138170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:08:59,599-Speed 8250.58 samples/sec Loss 5.4789 LearningRate 0.0154 Epoch: 24 Global Step: 138180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:00,516-Speed 11176.07 samples/sec Loss 5.7310 LearningRate 0.0154 Epoch: 24 Global Step: 138190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:01,474-Speed 10698.88 samples/sec Loss 5.6351 LearningRate 0.0154 Epoch: 24 Global Step: 138200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:02,425-Speed 10774.03 samples/sec Loss 5.5344 LearningRate 0.0154 Epoch: 24 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:03,369-Speed 10861.16 samples/sec Loss 5.5498 LearningRate 0.0154 Epoch: 24 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:04,323-Speed 10736.64 samples/sec Loss 5.5727 LearningRate 0.0154 Epoch: 24 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:05,192-Speed 11795.80 samples/sec Loss 5.5201 LearningRate 0.0154 Epoch: 24 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:06,106-Speed 11212.08 samples/sec Loss 5.6057 LearningRate 0.0154 Epoch: 24 Global Step: 138250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:07,027-Speed 11126.00 samples/sec Loss 5.5186 LearningRate 0.0154 Epoch: 24 Global Step: 138260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:08,009-Speed 10451.71 samples/sec Loss 5.6762 LearningRate 0.0154 Epoch: 24 Global Step: 138270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:08,925-Speed 11189.21 samples/sec Loss 5.6070 LearningRate 0.0154 Epoch: 24 Global Step: 138280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:09,814-Speed 11517.12 samples/sec Loss 5.5299 LearningRate 0.0154 Epoch: 24 Global Step: 138290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:10,688-Speed 11727.50 samples/sec Loss 5.4677 LearningRate 0.0154 Epoch: 24 Global Step: 138300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:11,563-Speed 11704.31 samples/sec Loss 5.6247 LearningRate 0.0154 Epoch: 24 Global Step: 138310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:12,506-Speed 10874.77 samples/sec Loss 5.5896 LearningRate 0.0154 Epoch: 24 Global Step: 138320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:13,442-Speed 10941.98 samples/sec Loss 5.5983 LearningRate 0.0154 Epoch: 24 Global Step: 138330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:14,391-Speed 10801.09 samples/sec Loss 5.4589 LearningRate 0.0153 Epoch: 24 Global Step: 138340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:15,269-Speed 11675.18 samples/sec Loss 5.5627 LearningRate 0.0153 Epoch: 24 Global Step: 138350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:16,195-Speed 11061.30 samples/sec Loss 5.5865 LearningRate 0.0153 Epoch: 24 Global Step: 138360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:17,087-Speed 11493.41 samples/sec Loss 5.6058 LearningRate 0.0153 Epoch: 24 Global Step: 138370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:18,017-Speed 11014.85 samples/sec Loss 5.4638 LearningRate 0.0153 Epoch: 24 Global Step: 138380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:18,937-Speed 11132.48 samples/sec Loss 5.5601 LearningRate 0.0153 Epoch: 24 Global Step: 138390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:19,883-Speed 10831.77 samples/sec Loss 5.5642 LearningRate 0.0153 Epoch: 24 Global Step: 138400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:20,767-Speed 11594.94 samples/sec Loss 5.5972 LearningRate 0.0153 Epoch: 24 Global Step: 138410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:21,666-Speed 11403.78 samples/sec Loss 5.5493 LearningRate 0.0153 Epoch: 24 Global Step: 138420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:22,549-Speed 11596.07 samples/sec Loss 5.5355 LearningRate 0.0153 Epoch: 24 Global Step: 138430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:23,451-Speed 11367.69 samples/sec Loss 5.6286 LearningRate 0.0153 Epoch: 24 Global Step: 138440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:24,352-Speed 11374.80 samples/sec Loss 5.6954 LearningRate 0.0153 Epoch: 24 Global Step: 138450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:25,267-Speed 11199.55 samples/sec Loss 5.5321 LearningRate 0.0153 Epoch: 24 Global Step: 138460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:26,142-Speed 11705.94 samples/sec Loss 5.5988 LearningRate 0.0153 Epoch: 24 Global Step: 138470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:27,043-Speed 11381.75 samples/sec Loss 5.7321 LearningRate 0.0153 Epoch: 24 Global Step: 138480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:27,960-Speed 11179.63 samples/sec Loss 5.5408 LearningRate 0.0153 Epoch: 24 Global Step: 138490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:28,880-Speed 11143.39 samples/sec Loss 5.6094 LearningRate 0.0153 Epoch: 24 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:29,788-Speed 11297.13 samples/sec Loss 5.5941 LearningRate 0.0153 Epoch: 24 Global Step: 138510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:30,732-Speed 10849.31 samples/sec Loss 5.6083 LearningRate 0.0153 Epoch: 24 Global Step: 138520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:31,654-Speed 11120.59 samples/sec Loss 5.6086 LearningRate 0.0153 Epoch: 24 Global Step: 138530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:32,590-Speed 10939.53 samples/sec Loss 5.5437 LearningRate 0.0153 Epoch: 24 Global Step: 138540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:33,479-Speed 11530.51 samples/sec Loss 5.5591 LearningRate 0.0153 Epoch: 24 Global Step: 138550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:34,365-Speed 11563.21 samples/sec Loss 5.6292 LearningRate 0.0153 Epoch: 24 Global Step: 138560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:35,267-Speed 11360.14 samples/sec Loss 5.7238 LearningRate 0.0153 Epoch: 24 Global Step: 138570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:36,192-Speed 11081.71 samples/sec Loss 5.7234 LearningRate 0.0153 Epoch: 24 Global Step: 138580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:37,156-Speed 10630.67 samples/sec Loss 5.5627 LearningRate 0.0153 Epoch: 24 Global Step: 138590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:38,079-Speed 11102.34 samples/sec Loss 5.5034 LearningRate 0.0153 Epoch: 24 Global Step: 138600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:39,005-Speed 11067.27 samples/sec Loss 5.6223 LearningRate 0.0153 Epoch: 24 Global Step: 138610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:39,930-Speed 11079.88 samples/sec Loss 5.4878 LearningRate 0.0153 Epoch: 24 Global Step: 138620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:40,871-Speed 10880.11 samples/sec Loss 5.7083 LearningRate 0.0152 Epoch: 24 Global Step: 138630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:41,813-Speed 10881.07 samples/sec Loss 5.5788 LearningRate 0.0152 Epoch: 24 Global Step: 138640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:42,707-Speed 11470.28 samples/sec Loss 5.5652 LearningRate 0.0152 Epoch: 24 Global Step: 138650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:43,662-Speed 10727.81 samples/sec Loss 5.5815 LearningRate 0.0152 Epoch: 24 Global Step: 138660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:44,565-Speed 11340.82 samples/sec Loss 5.6137 LearningRate 0.0152 Epoch: 24 Global Step: 138670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:45,449-Speed 11600.49 samples/sec Loss 5.6335 LearningRate 0.0152 Epoch: 24 Global Step: 138680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:46,355-Speed 11308.37 samples/sec Loss 5.7052 LearningRate 0.0152 Epoch: 24 Global Step: 138690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:47,246-Speed 11500.42 samples/sec Loss 5.6292 LearningRate 0.0152 Epoch: 24 Global Step: 138700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:48,181-Speed 10960.57 samples/sec Loss 5.5802 LearningRate 0.0152 Epoch: 24 Global Step: 138710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:49,093-Speed 11237.64 samples/sec Loss 5.6260 LearningRate 0.0152 Epoch: 24 Global Step: 138720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:49,978-Speed 11571.53 samples/sec Loss 5.6320 LearningRate 0.0152 Epoch: 24 Global Step: 138730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:50,919-Speed 10898.14 samples/sec Loss 5.6065 LearningRate 0.0152 Epoch: 24 Global Step: 138740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:51,798-Speed 11659.79 samples/sec Loss 5.5220 LearningRate 0.0152 Epoch: 24 Global Step: 138750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:52,715-Speed 11176.21 samples/sec Loss 5.6142 LearningRate 0.0152 Epoch: 24 Global Step: 138760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:53,639-Speed 11088.94 samples/sec Loss 5.5857 LearningRate 0.0152 Epoch: 24 Global Step: 138770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:54,525-Speed 11560.87 samples/sec Loss 5.5973 LearningRate 0.0152 Epoch: 24 Global Step: 138780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:55,454-Speed 11031.36 samples/sec Loss 5.5978 LearningRate 0.0152 Epoch: 24 Global Step: 138790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:56,372-Speed 11158.93 samples/sec Loss 5.5067 LearningRate 0.0152 Epoch: 24 Global Step: 138800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:09:57,255-Speed 11614.44 samples/sec Loss 5.7222 LearningRate 0.0152 Epoch: 24 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:58,145-Speed 11512.84 samples/sec Loss 5.6044 LearningRate 0.0152 Epoch: 24 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:59,027-Speed 11618.36 samples/sec Loss 5.4854 LearningRate 0.0152 Epoch: 24 Global Step: 138830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:09:59,925-Speed 11406.76 samples/sec Loss 5.6376 LearningRate 0.0152 Epoch: 24 Global Step: 138840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:00,849-Speed 11088.26 samples/sec Loss 5.4919 LearningRate 0.0152 Epoch: 24 Global Step: 138850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:01,745-Speed 11437.23 samples/sec Loss 5.7078 LearningRate 0.0152 Epoch: 24 Global Step: 138860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:02,674-Speed 11034.48 samples/sec Loss 5.6809 LearningRate 0.0152 Epoch: 24 Global Step: 138870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:03,577-Speed 11348.06 samples/sec Loss 5.6135 LearningRate 0.0152 Epoch: 24 Global Step: 138880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:04,497-Speed 11133.08 samples/sec Loss 5.4424 LearningRate 0.0152 Epoch: 24 Global Step: 138890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:05,435-Speed 10920.98 samples/sec Loss 5.5864 LearningRate 0.0152 Epoch: 24 Global Step: 138900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:06,311-Speed 11697.65 samples/sec Loss 5.6715 LearningRate 0.0152 Epoch: 24 Global Step: 138910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:07,224-Speed 11235.68 samples/sec Loss 5.6672 LearningRate 0.0151 Epoch: 24 Global Step: 138920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:08,123-Speed 11397.34 samples/sec Loss 5.5218 LearningRate 0.0151 Epoch: 24 Global Step: 138930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:09,017-Speed 11467.18 samples/sec Loss 5.6360 LearningRate 0.0151 Epoch: 24 Global Step: 138940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:09,918-Speed 11371.57 samples/sec Loss 5.6185 LearningRate 0.0151 Epoch: 24 Global Step: 138950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:10,812-Speed 11459.44 samples/sec Loss 5.6781 LearningRate 0.0151 Epoch: 24 Global Step: 138960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:11,718-Speed 11315.15 samples/sec Loss 5.5146 LearningRate 0.0151 Epoch: 24 Global Step: 138970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:12,666-Speed 10803.88 samples/sec Loss 5.6879 LearningRate 0.0151 Epoch: 24 Global Step: 138980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:13,595-Speed 11031.22 samples/sec Loss 5.5782 LearningRate 0.0151 Epoch: 24 Global Step: 138990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:14,548-Speed 10755.30 samples/sec Loss 5.6292 LearningRate 0.0151 Epoch: 24 Global Step: 139000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:15,483-Speed 10958.88 samples/sec Loss 5.4960 LearningRate 0.0151 Epoch: 24 Global Step: 139010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:16,371-Speed 11550.82 samples/sec Loss 5.6344 LearningRate 0.0151 Epoch: 24 Global Step: 139020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:17,286-Speed 11193.58 samples/sec Loss 5.6341 LearningRate 0.0151 Epoch: 24 Global Step: 139030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:18,254-Speed 10584.30 samples/sec Loss 5.6562 LearningRate 0.0151 Epoch: 24 Global Step: 139040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:19,186-Speed 10998.57 samples/sec Loss 5.5265 LearningRate 0.0151 Epoch: 24 Global Step: 139050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:20,117-Speed 11000.96 samples/sec Loss 5.5951 LearningRate 0.0151 Epoch: 24 Global Step: 139060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:20,994-Speed 11683.67 samples/sec Loss 5.6296 LearningRate 0.0151 Epoch: 24 Global Step: 139070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:21,898-Speed 11348.23 samples/sec Loss 5.6265 LearningRate 0.0151 Epoch: 24 Global Step: 139080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:22,812-Speed 11208.88 samples/sec Loss 5.6847 LearningRate 0.0151 Epoch: 24 Global Step: 139090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:23,682-Speed 11778.00 samples/sec Loss 5.5296 LearningRate 0.0151 Epoch: 24 Global Step: 139100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:24,584-Speed 11355.55 samples/sec Loss 5.5865 LearningRate 0.0151 Epoch: 24 Global Step: 139110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:25,504-Speed 11137.60 samples/sec Loss 5.5799 LearningRate 0.0151 Epoch: 24 Global Step: 139120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:26,440-Speed 10954.36 samples/sec Loss 5.5158 LearningRate 0.0151 Epoch: 24 Global Step: 139130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:27,404-Speed 10622.03 samples/sec Loss 5.5450 LearningRate 0.0151 Epoch: 24 Global Step: 139140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:28,309-Speed 11331.27 samples/sec Loss 5.5709 LearningRate 0.0151 Epoch: 24 Global Step: 139150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:29,267-Speed 10697.71 samples/sec Loss 5.6428 LearningRate 0.0151 Epoch: 24 Global Step: 139160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:30,170-Speed 11336.87 samples/sec Loss 5.5901 LearningRate 0.0151 Epoch: 24 Global Step: 139170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:31,109-Speed 10913.96 samples/sec Loss 5.5439 LearningRate 0.0151 Epoch: 24 Global Step: 139180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:32,050-Speed 10893.59 samples/sec Loss 5.6260 LearningRate 0.0151 Epoch: 24 Global Step: 139190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:32,942-Speed 11483.97 samples/sec Loss 5.5943 LearningRate 0.0151 Epoch: 24 Global Step: 139200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:33,855-Speed 11226.00 samples/sec Loss 5.6008 LearningRate 0.0150 Epoch: 24 Global Step: 139210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:10:34,748-Speed 11472.96 samples/sec Loss 5.4813 LearningRate 0.0150 Epoch: 24 Global Step: 139220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:35,651-Speed 11357.01 samples/sec Loss 5.7294 LearningRate 0.0150 Epoch: 24 Global Step: 139230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:36,543-Speed 11484.01 samples/sec Loss 5.6083 LearningRate 0.0150 Epoch: 24 Global Step: 139240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:37,497-Speed 10751.95 samples/sec Loss 5.7507 LearningRate 0.0150 Epoch: 24 Global Step: 139250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:38,428-Speed 11004.30 samples/sec Loss 5.7621 LearningRate 0.0150 Epoch: 24 Global Step: 139260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:39,374-Speed 10831.24 samples/sec Loss 5.4807 LearningRate 0.0150 Epoch: 24 Global Step: 139270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:40,292-Speed 11159.20 samples/sec Loss 5.5331 LearningRate 0.0150 Epoch: 24 Global Step: 139280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:41,225-Speed 10991.29 samples/sec Loss 5.6758 LearningRate 0.0150 Epoch: 24 Global Step: 139290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:42,142-Speed 11173.10 samples/sec Loss 5.5601 LearningRate 0.0150 Epoch: 24 Global Step: 139300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:43,049-Speed 11298.94 samples/sec Loss 5.6723 LearningRate 0.0150 Epoch: 24 Global Step: 139310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:43,976-Speed 11055.85 samples/sec Loss 5.7826 LearningRate 0.0150 Epoch: 24 Global Step: 139320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:44,849-Speed 11725.30 samples/sec Loss 5.6201 LearningRate 0.0150 Epoch: 24 Global Step: 139330 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:45,754-Speed 11330.27 samples/sec Loss 5.4901 LearningRate 0.0150 Epoch: 24 Global Step: 139340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:46,626-Speed 11743.70 samples/sec Loss 5.5652 LearningRate 0.0150 Epoch: 24 Global Step: 139350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:47,532-Speed 11317.90 samples/sec Loss 5.6180 LearningRate 0.0150 Epoch: 24 Global Step: 139360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:48,446-Speed 11206.81 samples/sec Loss 5.5935 LearningRate 0.0150 Epoch: 24 Global Step: 139370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:49,335-Speed 11528.59 samples/sec Loss 5.6991 LearningRate 0.0150 Epoch: 24 Global Step: 139380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:50,295-Speed 10671.97 samples/sec Loss 5.6522 LearningRate 0.0150 Epoch: 24 Global Step: 139390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:51,266-Speed 10553.48 samples/sec Loss 5.5848 LearningRate 0.0150 Epoch: 24 Global Step: 139400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:52,212-Speed 10830.69 samples/sec Loss 5.7162 LearningRate 0.0150 Epoch: 24 Global Step: 139410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:53,114-Speed 11358.13 samples/sec Loss 5.6346 LearningRate 0.0150 Epoch: 24 Global Step: 139420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:54,094-Speed 10457.59 samples/sec Loss 5.5610 LearningRate 0.0150 Epoch: 24 Global Step: 139430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:55,002-Speed 11290.36 samples/sec Loss 5.5319 LearningRate 0.0150 Epoch: 24 Global Step: 139440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:55,924-Speed 11114.09 samples/sec Loss 5.5877 LearningRate 0.0150 Epoch: 24 Global Step: 139450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:56,896-Speed 10539.17 samples/sec Loss 5.6455 LearningRate 0.0150 Epoch: 24 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:10:57,849-Speed 10752.01 samples/sec Loss 5.6408 LearningRate 0.0150 Epoch: 24 Global Step: 139470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:58,818-Speed 10576.45 samples/sec Loss 5.6167 LearningRate 0.0150 Epoch: 24 Global Step: 139480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:10:59,716-Speed 11407.81 samples/sec Loss 5.5700 LearningRate 0.0150 Epoch: 24 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:00,666-Speed 10792.83 samples/sec Loss 5.6682 LearningRate 0.0149 Epoch: 24 Global Step: 139500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:01,616-Speed 10785.56 samples/sec Loss 5.5603 LearningRate 0.0149 Epoch: 24 Global Step: 139510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:02,551-Speed 10967.20 samples/sec Loss 5.6624 LearningRate 0.0149 Epoch: 24 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:03,531-Speed 10455.64 samples/sec Loss 5.6369 LearningRate 0.0149 Epoch: 24 Global Step: 139530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:04,446-Speed 11196.78 samples/sec Loss 5.6694 LearningRate 0.0149 Epoch: 24 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:05,407-Speed 10666.16 samples/sec Loss 5.5846 LearningRate 0.0149 Epoch: 24 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:06,302-Speed 11449.28 samples/sec Loss 5.6574 LearningRate 0.0149 Epoch: 24 Global Step: 139560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:07,193-Speed 11502.80 samples/sec Loss 5.6756 LearningRate 0.0149 Epoch: 24 Global Step: 139570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:08,140-Speed 10813.15 samples/sec Loss 5.6077 LearningRate 0.0149 Epoch: 24 Global Step: 139580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:09,036-Speed 11441.31 samples/sec Loss 5.5561 LearningRate 0.0149 Epoch: 24 Global Step: 139590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:11:09,923-Speed 11550.90 samples/sec Loss 5.6846 LearningRate 0.0149 Epoch: 24 Global Step: 139600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:11:10,822-Speed 11398.31 samples/sec Loss 5.6528 LearningRate 0.0149 Epoch: 24 Global Step: 139610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:11:11,722-Speed 11389.72 samples/sec Loss 5.5330 LearningRate 0.0149 Epoch: 24 Global Step: 139620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:11:12,627-Speed 11320.09 samples/sec Loss 5.4572 LearningRate 0.0149 Epoch: 24 Global Step: 139630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:11:13,582-Speed 10727.11 samples/sec Loss 5.4781 LearningRate 0.0149 Epoch: 24 Global Step: 139640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:14,489-Speed 11296.94 samples/sec Loss 5.6072 LearningRate 0.0149 Epoch: 24 Global Step: 139650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:15,392-Speed 11356.44 samples/sec Loss 5.6139 LearningRate 0.0149 Epoch: 24 Global Step: 139660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:16,299-Speed 11288.06 samples/sec Loss 5.7650 LearningRate 0.0149 Epoch: 24 Global Step: 139670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:17,210-Speed 11254.31 samples/sec Loss 5.7009 LearningRate 0.0149 Epoch: 24 Global Step: 139680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:18,152-Speed 10877.96 samples/sec Loss 5.8124 LearningRate 0.0149 Epoch: 24 Global Step: 139690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:19,057-Speed 11325.22 samples/sec Loss 5.6648 LearningRate 0.0149 Epoch: 24 Global Step: 139700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:19,946-Speed 11527.86 samples/sec Loss 5.5266 LearningRate 0.0149 Epoch: 24 Global Step: 139710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:20,843-Speed 11422.18 samples/sec Loss 5.6548 LearningRate 0.0149 Epoch: 24 Global Step: 139720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:21,770-Speed 11053.67 samples/sec Loss 5.5790 LearningRate 0.0149 Epoch: 24 Global Step: 139730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:22,684-Speed 11221.68 samples/sec Loss 5.6415 LearningRate 0.0149 Epoch: 24 Global Step: 139740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:23,578-Speed 11454.50 samples/sec Loss 5.5302 LearningRate 0.0149 Epoch: 24 Global Step: 139750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:24,470-Speed 11495.50 samples/sec Loss 5.7338 LearningRate 0.0149 Epoch: 24 Global Step: 139760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:25,371-Speed 11362.43 samples/sec Loss 5.7121 LearningRate 0.0149 Epoch: 24 Global Step: 139770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:26,255-Speed 11598.49 samples/sec Loss 5.6940 LearningRate 0.0149 Epoch: 24 Global Step: 139780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:27,179-Speed 11083.27 samples/sec Loss 5.7157 LearningRate 0.0149 Epoch: 24 Global Step: 139790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:28,083-Speed 11336.84 samples/sec Loss 5.6830 LearningRate 0.0148 Epoch: 24 Global Step: 139800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:29,001-Speed 11169.16 samples/sec Loss 5.6935 LearningRate 0.0148 Epoch: 24 Global Step: 139810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:29,950-Speed 10795.17 samples/sec Loss 5.6628 LearningRate 0.0148 Epoch: 24 Global Step: 139820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:30,857-Speed 11300.89 samples/sec Loss 5.6211 LearningRate 0.0148 Epoch: 24 Global Step: 139830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:31,766-Speed 11271.93 samples/sec Loss 5.5260 LearningRate 0.0148 Epoch: 24 Global Step: 139840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:32,659-Speed 11474.49 samples/sec Loss 5.6564 LearningRate 0.0148 Epoch: 24 Global Step: 139850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:33,551-Speed 11492.94 samples/sec Loss 5.5687 LearningRate 0.0148 Epoch: 24 Global Step: 139860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:34,433-Speed 11617.59 samples/sec Loss 5.6667 LearningRate 0.0148 Epoch: 24 Global Step: 139870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:35,361-Speed 11032.84 samples/sec Loss 5.6987 LearningRate 0.0148 Epoch: 24 Global Step: 139880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:36,297-Speed 10945.39 samples/sec Loss 5.6544 LearningRate 0.0148 Epoch: 24 Global Step: 139890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:37,195-Speed 11417.03 samples/sec Loss 5.5414 LearningRate 0.0148 Epoch: 24 Global Step: 139900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:38,160-Speed 10621.00 samples/sec Loss 5.6639 LearningRate 0.0148 Epoch: 24 Global Step: 139910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:11:39,110-Speed 10779.65 samples/sec Loss 5.5178 LearningRate 0.0148 Epoch: 24 Global Step: 139920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:40,009-Speed 11397.00 samples/sec Loss 5.5332 LearningRate 0.0148 Epoch: 24 Global Step: 139930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:40,899-Speed 11516.28 samples/sec Loss 5.6471 LearningRate 0.0148 Epoch: 24 Global Step: 139940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:41,843-Speed 10860.53 samples/sec Loss 5.6712 LearningRate 0.0148 Epoch: 24 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:42,797-Speed 10739.43 samples/sec Loss 5.4754 LearningRate 0.0148 Epoch: 24 Global Step: 139960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:43,689-Speed 11488.22 samples/sec Loss 5.7521 LearningRate 0.0148 Epoch: 24 Global Step: 139970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:44,567-Speed 11674.25 samples/sec Loss 5.8131 LearningRate 0.0148 Epoch: 24 Global Step: 139980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:45,452-Speed 11577.37 samples/sec Loss 5.8360 LearningRate 0.0148 Epoch: 24 Global Step: 139990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:11:46,348-Speed 11439.02 samples/sec Loss 5.7512 LearningRate 0.0148 Epoch: 24 Global Step: 140000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:12:07,608-[lfw][140000]XNorm: 9.342785 Training: 2022-04-11 04:12:07,609-[lfw][140000]Accuracy-Flip: 0.99533+-0.00356 Training: 2022-04-11 04:12:07,609-[lfw][140000]Accuracy-Highest: 0.99717 Training: 2022-04-11 04:12:32,221-[cfp_fp][140000]XNorm: 7.932398 Training: 2022-04-11 04:12:32,222-[cfp_fp][140000]Accuracy-Flip: 0.94743+-0.01255 Training: 2022-04-11 04:12:32,222-[cfp_fp][140000]Accuracy-Highest: 0.95171 Training: 2022-04-11 04:12:53,420-[agedb_30][140000]XNorm: 9.204817 Training: 2022-04-11 04:12:53,421-[agedb_30][140000]Accuracy-Flip: 0.96533+-0.00812 Training: 2022-04-11 04:12:53,421-[agedb_30][140000]Accuracy-Highest: 0.96800 Training: 2022-04-11 04:12:54,324-Speed 150.64 samples/sec Loss 5.6534 LearningRate 0.0148 Epoch: 24 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:12:55,233-Speed 11281.01 samples/sec Loss 5.6259 LearningRate 0.0148 Epoch: 24 Global Step: 140020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:12:56,126-Speed 11473.13 samples/sec Loss 5.6342 LearningRate 0.0148 Epoch: 24 Global Step: 140030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:12:57,010-Speed 11581.10 samples/sec Loss 5.5444 LearningRate 0.0148 Epoch: 24 Global Step: 140040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:12:57,908-Speed 11416.90 samples/sec Loss 5.6391 LearningRate 0.0148 Epoch: 24 Global Step: 140050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:12:58,845-Speed 10940.54 samples/sec Loss 5.7310 LearningRate 0.0148 Epoch: 24 Global Step: 140060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:12:59,757-Speed 11226.68 samples/sec Loss 5.6107 LearningRate 0.0148 Epoch: 24 Global Step: 140070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:00,663-Speed 11313.27 samples/sec Loss 5.6125 LearningRate 0.0148 Epoch: 24 Global Step: 140080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:01,553-Speed 11518.57 samples/sec Loss 5.7707 LearningRate 0.0148 Epoch: 24 Global Step: 140090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:02,491-Speed 10925.83 samples/sec Loss 5.7999 LearningRate 0.0147 Epoch: 24 Global Step: 140100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:03,394-Speed 11344.58 samples/sec Loss 5.5552 LearningRate 0.0147 Epoch: 24 Global Step: 140110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:04,309-Speed 11200.96 samples/sec Loss 5.8178 LearningRate 0.0147 Epoch: 24 Global Step: 140120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:05,209-Speed 11390.58 samples/sec Loss 5.6823 LearningRate 0.0147 Epoch: 24 Global Step: 140130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:06,128-Speed 11147.92 samples/sec Loss 5.7061 LearningRate 0.0147 Epoch: 24 Global Step: 140140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:07,040-Speed 11237.87 samples/sec Loss 5.6060 LearningRate 0.0147 Epoch: 24 Global Step: 140150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:07,967-Speed 11048.07 samples/sec Loss 5.7346 LearningRate 0.0147 Epoch: 24 Global Step: 140160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:08,853-Speed 11573.79 samples/sec Loss 5.6800 LearningRate 0.0147 Epoch: 24 Global Step: 140170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:09,746-Speed 11465.79 samples/sec Loss 5.8097 LearningRate 0.0147 Epoch: 24 Global Step: 140180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:10,624-Speed 11674.43 samples/sec Loss 5.5667 LearningRate 0.0147 Epoch: 24 Global Step: 140190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:11,519-Speed 11447.28 samples/sec Loss 5.6157 LearningRate 0.0147 Epoch: 24 Global Step: 140200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:12,430-Speed 11250.51 samples/sec Loss 5.6816 LearningRate 0.0147 Epoch: 24 Global Step: 140210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:13,316-Speed 11559.96 samples/sec Loss 5.6670 LearningRate 0.0147 Epoch: 24 Global Step: 140220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:14,257-Speed 10897.88 samples/sec Loss 5.6186 LearningRate 0.0147 Epoch: 24 Global Step: 140230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:15,235-Speed 10470.37 samples/sec Loss 5.6948 LearningRate 0.0147 Epoch: 24 Global Step: 140240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:13:16,162-Speed 11050.27 samples/sec Loss 5.6595 LearningRate 0.0147 Epoch: 24 Global Step: 140250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:17,119-Speed 10707.03 samples/sec Loss 5.6895 LearningRate 0.0147 Epoch: 24 Global Step: 140260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:18,025-Speed 11323.29 samples/sec Loss 5.7117 LearningRate 0.0147 Epoch: 24 Global Step: 140270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:18,927-Speed 11363.30 samples/sec Loss 5.5535 LearningRate 0.0147 Epoch: 24 Global Step: 140280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:19,855-Speed 11034.07 samples/sec Loss 5.7059 LearningRate 0.0147 Epoch: 24 Global Step: 140290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:20,793-Speed 10930.94 samples/sec Loss 5.5774 LearningRate 0.0147 Epoch: 24 Global Step: 140300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:21,669-Speed 11700.02 samples/sec Loss 5.6433 LearningRate 0.0147 Epoch: 24 Global Step: 140310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:22,572-Speed 11344.24 samples/sec Loss 5.7061 LearningRate 0.0147 Epoch: 24 Global Step: 140320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:23,489-Speed 11179.84 samples/sec Loss 5.6208 LearningRate 0.0147 Epoch: 24 Global Step: 140330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:24,375-Speed 11561.35 samples/sec Loss 5.7138 LearningRate 0.0147 Epoch: 24 Global Step: 140340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:25,270-Speed 11446.55 samples/sec Loss 5.7094 LearningRate 0.0147 Epoch: 24 Global Step: 140350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:26,172-Speed 11366.77 samples/sec Loss 5.6841 LearningRate 0.0147 Epoch: 24 Global Step: 140360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:27,088-Speed 11183.99 samples/sec Loss 5.7911 LearningRate 0.0147 Epoch: 24 Global Step: 140370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:13:28,001-Speed 11226.23 samples/sec Loss 5.6214 LearningRate 0.0147 Epoch: 24 Global Step: 140380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:28,881-Speed 11642.42 samples/sec Loss 5.6547 LearningRate 0.0146 Epoch: 24 Global Step: 140390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:29,808-Speed 11055.24 samples/sec Loss 5.6241 LearningRate 0.0146 Epoch: 24 Global Step: 140400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:30,758-Speed 10781.57 samples/sec Loss 5.6603 LearningRate 0.0146 Epoch: 24 Global Step: 140410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:31,676-Speed 11171.11 samples/sec Loss 5.7031 LearningRate 0.0146 Epoch: 24 Global Step: 140420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:32,571-Speed 11452.09 samples/sec Loss 5.6313 LearningRate 0.0146 Epoch: 24 Global Step: 140430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:33,470-Speed 11387.70 samples/sec Loss 5.6163 LearningRate 0.0146 Epoch: 24 Global Step: 140440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:34,349-Speed 11665.77 samples/sec Loss 5.6892 LearningRate 0.0146 Epoch: 24 Global Step: 140450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:35,244-Speed 11439.18 samples/sec Loss 5.8041 LearningRate 0.0146 Epoch: 24 Global Step: 140460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:36,147-Speed 11355.64 samples/sec Loss 5.7351 LearningRate 0.0146 Epoch: 24 Global Step: 140470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:37,094-Speed 10819.07 samples/sec Loss 5.6036 LearningRate 0.0146 Epoch: 24 Global Step: 140480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:37,999-Speed 11323.10 samples/sec Loss 5.6954 LearningRate 0.0146 Epoch: 24 Global Step: 140490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:38,942-Speed 10863.51 samples/sec Loss 5.6294 LearningRate 0.0146 Epoch: 24 Global Step: 140500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:39,871-Speed 11028.61 samples/sec Loss 5.7151 LearningRate 0.0146 Epoch: 24 Global Step: 140510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:40,783-Speed 11239.78 samples/sec Loss 5.4852 LearningRate 0.0146 Epoch: 24 Global Step: 140520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:41,707-Speed 11089.01 samples/sec Loss 5.5009 LearningRate 0.0146 Epoch: 24 Global Step: 140530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:42,624-Speed 11183.14 samples/sec Loss 5.7408 LearningRate 0.0146 Epoch: 24 Global Step: 140540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:43,564-Speed 10903.36 samples/sec Loss 5.5920 LearningRate 0.0146 Epoch: 24 Global Step: 140550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:44,466-Speed 11352.30 samples/sec Loss 5.7027 LearningRate 0.0146 Epoch: 24 Global Step: 140560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:45,340-Speed 11732.82 samples/sec Loss 5.5898 LearningRate 0.0146 Epoch: 24 Global Step: 140570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:46,234-Speed 11461.27 samples/sec Loss 5.6741 LearningRate 0.0146 Epoch: 24 Global Step: 140580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:13:47,135-Speed 11372.71 samples/sec Loss 5.6691 LearningRate 0.0146 Epoch: 24 Global Step: 140590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:48,041-Speed 11310.30 samples/sec Loss 5.7125 LearningRate 0.0146 Epoch: 24 Global Step: 140600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:48,963-Speed 11113.36 samples/sec Loss 5.7510 LearningRate 0.0146 Epoch: 24 Global Step: 140610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:49,906-Speed 10865.47 samples/sec Loss 5.6828 LearningRate 0.0146 Epoch: 24 Global Step: 140620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:50,834-Speed 11045.47 samples/sec Loss 5.6396 LearningRate 0.0146 Epoch: 24 Global Step: 140630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:51,801-Speed 10594.79 samples/sec Loss 5.8255 LearningRate 0.0146 Epoch: 24 Global Step: 140640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:52,763-Speed 10647.68 samples/sec Loss 5.7324 LearningRate 0.0146 Epoch: 24 Global Step: 140650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:53,654-Speed 11505.74 samples/sec Loss 5.6456 LearningRate 0.0146 Epoch: 24 Global Step: 140660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:54,575-Speed 11119.59 samples/sec Loss 5.6409 LearningRate 0.0146 Epoch: 24 Global Step: 140670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:55,523-Speed 10815.45 samples/sec Loss 5.7051 LearningRate 0.0146 Epoch: 24 Global Step: 140680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:13:56,451-Speed 11037.88 samples/sec Loss 5.6434 LearningRate 0.0145 Epoch: 24 Global Step: 140690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:13:57,345-Speed 11461.73 samples/sec Loss 5.6196 LearningRate 0.0145 Epoch: 24 Global Step: 140700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:13:58,234-Speed 11535.12 samples/sec Loss 5.7421 LearningRate 0.0145 Epoch: 24 Global Step: 140710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:13:59,135-Speed 11375.68 samples/sec Loss 5.6788 LearningRate 0.0145 Epoch: 24 Global Step: 140720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:00,086-Speed 10776.66 samples/sec Loss 5.5134 LearningRate 0.0145 Epoch: 24 Global Step: 140730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:00,997-Speed 11239.46 samples/sec Loss 5.6424 LearningRate 0.0145 Epoch: 24 Global Step: 140740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:01,909-Speed 11235.71 samples/sec Loss 5.7302 LearningRate 0.0145 Epoch: 24 Global Step: 140750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:02,831-Speed 11118.97 samples/sec Loss 5.7392 LearningRate 0.0145 Epoch: 24 Global Step: 140760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:03,799-Speed 10586.58 samples/sec Loss 5.6917 LearningRate 0.0145 Epoch: 24 Global Step: 140770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:04,725-Speed 11070.10 samples/sec Loss 5.6567 LearningRate 0.0145 Epoch: 24 Global Step: 140780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:05,667-Speed 10879.56 samples/sec Loss 5.6806 LearningRate 0.0145 Epoch: 24 Global Step: 140790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:06,602-Speed 10951.74 samples/sec Loss 5.8195 LearningRate 0.0145 Epoch: 24 Global Step: 140800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:07,503-Speed 11384.73 samples/sec Loss 5.6097 LearningRate 0.0145 Epoch: 24 Global Step: 140810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:08,408-Speed 11317.88 samples/sec Loss 5.6662 LearningRate 0.0145 Epoch: 24 Global Step: 140820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:09,292-Speed 11593.34 samples/sec Loss 5.5283 LearningRate 0.0145 Epoch: 24 Global Step: 140830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:10,182-Speed 11517.08 samples/sec Loss 5.6409 LearningRate 0.0145 Epoch: 24 Global Step: 140840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:11,066-Speed 11593.76 samples/sec Loss 5.6888 LearningRate 0.0145 Epoch: 24 Global Step: 140850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:11,993-Speed 11055.54 samples/sec Loss 5.6928 LearningRate 0.0145 Epoch: 24 Global Step: 140860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:12,929-Speed 10949.26 samples/sec Loss 5.6205 LearningRate 0.0145 Epoch: 24 Global Step: 140870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:13,883-Speed 10737.61 samples/sec Loss 5.6807 LearningRate 0.0145 Epoch: 24 Global Step: 140880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:14,817-Speed 10970.49 samples/sec Loss 5.7138 LearningRate 0.0145 Epoch: 24 Global Step: 140890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:15,704-Speed 11554.36 samples/sec Loss 5.7055 LearningRate 0.0145 Epoch: 24 Global Step: 140900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:16,654-Speed 10777.66 samples/sec Loss 5.6235 LearningRate 0.0145 Epoch: 24 Global Step: 140910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:17,546-Speed 11495.22 samples/sec Loss 5.6204 LearningRate 0.0145 Epoch: 24 Global Step: 140920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:18,487-Speed 10889.25 samples/sec Loss 5.6892 LearningRate 0.0145 Epoch: 24 Global Step: 140930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:19,382-Speed 11444.78 samples/sec Loss 5.5781 LearningRate 0.0145 Epoch: 24 Global Step: 140940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:20,304-Speed 11121.69 samples/sec Loss 5.6154 LearningRate 0.0145 Epoch: 24 Global Step: 140950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:21,185-Speed 11627.64 samples/sec Loss 5.6352 LearningRate 0.0145 Epoch: 24 Global Step: 140960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:22,070-Speed 11572.45 samples/sec Loss 5.6542 LearningRate 0.0145 Epoch: 24 Global Step: 140970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:22,966-Speed 11445.05 samples/sec Loss 5.8171 LearningRate 0.0145 Epoch: 24 Global Step: 140980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:23,873-Speed 11290.09 samples/sec Loss 5.6914 LearningRate 0.0144 Epoch: 24 Global Step: 140990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:24,811-Speed 10930.16 samples/sec Loss 5.6401 LearningRate 0.0144 Epoch: 24 Global Step: 141000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:25,703-Speed 11498.95 samples/sec Loss 5.6786 LearningRate 0.0144 Epoch: 24 Global Step: 141010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:26,609-Speed 11314.45 samples/sec Loss 5.5707 LearningRate 0.0144 Epoch: 24 Global Step: 141020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:27,508-Speed 11407.29 samples/sec Loss 5.7702 LearningRate 0.0144 Epoch: 24 Global Step: 141030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:28,420-Speed 11234.76 samples/sec Loss 5.8288 LearningRate 0.0144 Epoch: 24 Global Step: 141040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:29,337-Speed 11174.52 samples/sec Loss 5.7235 LearningRate 0.0144 Epoch: 24 Global Step: 141050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:30,257-Speed 11138.42 samples/sec Loss 5.6593 LearningRate 0.0144 Epoch: 24 Global Step: 141060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:31,141-Speed 11595.09 samples/sec Loss 5.5748 LearningRate 0.0144 Epoch: 24 Global Step: 141070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:32,078-Speed 10936.19 samples/sec Loss 5.6256 LearningRate 0.0144 Epoch: 24 Global Step: 141080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:32,974-Speed 11438.15 samples/sec Loss 5.5851 LearningRate 0.0144 Epoch: 24 Global Step: 141090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:33,863-Speed 11520.77 samples/sec Loss 5.6496 LearningRate 0.0144 Epoch: 24 Global Step: 141100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:14:34,746-Speed 11608.47 samples/sec Loss 5.7151 LearningRate 0.0144 Epoch: 24 Global Step: 141110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:35,653-Speed 11301.35 samples/sec Loss 5.7410 LearningRate 0.0144 Epoch: 24 Global Step: 141120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:36,575-Speed 11108.52 samples/sec Loss 5.8006 LearningRate 0.0144 Epoch: 24 Global Step: 141130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:37,491-Speed 11185.89 samples/sec Loss 5.7214 LearningRate 0.0144 Epoch: 24 Global Step: 141140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:38,432-Speed 10899.24 samples/sec Loss 5.7484 LearningRate 0.0144 Epoch: 24 Global Step: 141150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:39,333-Speed 11360.80 samples/sec Loss 5.5896 LearningRate 0.0144 Epoch: 24 Global Step: 141160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:40,250-Speed 11179.74 samples/sec Loss 5.5461 LearningRate 0.0144 Epoch: 24 Global Step: 141170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:41,165-Speed 11193.87 samples/sec Loss 5.5911 LearningRate 0.0144 Epoch: 24 Global Step: 141180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:42,068-Speed 11356.68 samples/sec Loss 5.6320 LearningRate 0.0144 Epoch: 24 Global Step: 141190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:42,967-Speed 11396.64 samples/sec Loss 5.6545 LearningRate 0.0144 Epoch: 24 Global Step: 141200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:43,855-Speed 11539.77 samples/sec Loss 5.6104 LearningRate 0.0144 Epoch: 24 Global Step: 141210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:44,751-Speed 11434.17 samples/sec Loss 5.6296 LearningRate 0.0144 Epoch: 24 Global Step: 141220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:45,640-Speed 11531.50 samples/sec Loss 5.5715 LearningRate 0.0144 Epoch: 24 Global Step: 141230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:46,537-Speed 11418.45 samples/sec Loss 5.7574 LearningRate 0.0144 Epoch: 24 Global Step: 141240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:47,441-Speed 11333.10 samples/sec Loss 5.6167 LearningRate 0.0144 Epoch: 24 Global Step: 141250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:48,396-Speed 10734.76 samples/sec Loss 5.5600 LearningRate 0.0144 Epoch: 24 Global Step: 141260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:49,324-Speed 11049.57 samples/sec Loss 5.7478 LearningRate 0.0144 Epoch: 24 Global Step: 141270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:50,274-Speed 10781.84 samples/sec Loss 5.7065 LearningRate 0.0144 Epoch: 24 Global Step: 141280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:51,227-Speed 10754.59 samples/sec Loss 5.7617 LearningRate 0.0143 Epoch: 24 Global Step: 141290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:52,159-Speed 10992.70 samples/sec Loss 5.6908 LearningRate 0.0143 Epoch: 24 Global Step: 141300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:14:53,079-Speed 11140.43 samples/sec Loss 5.7199 LearningRate 0.0143 Epoch: 24 Global Step: 141310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:53,978-Speed 11403.05 samples/sec Loss 5.8204 LearningRate 0.0143 Epoch: 24 Global Step: 141320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:54,873-Speed 11449.82 samples/sec Loss 5.6908 LearningRate 0.0143 Epoch: 24 Global Step: 141330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:55,782-Speed 11263.76 samples/sec Loss 5.5951 LearningRate 0.0143 Epoch: 24 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:56,666-Speed 11602.30 samples/sec Loss 5.7896 LearningRate 0.0143 Epoch: 24 Global Step: 141350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:57,561-Speed 11448.91 samples/sec Loss 5.7476 LearningRate 0.0143 Epoch: 24 Global Step: 141360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:58,494-Speed 10973.99 samples/sec Loss 5.5746 LearningRate 0.0143 Epoch: 24 Global Step: 141370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:14:59,372-Speed 11675.21 samples/sec Loss 5.7800 LearningRate 0.0143 Epoch: 24 Global Step: 141380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:00,287-Speed 11202.26 samples/sec Loss 5.6987 LearningRate 0.0143 Epoch: 24 Global Step: 141390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:01,170-Speed 11601.66 samples/sec Loss 5.7674 LearningRate 0.0143 Epoch: 24 Global Step: 141400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:02,088-Speed 11168.43 samples/sec Loss 5.6242 LearningRate 0.0143 Epoch: 24 Global Step: 141410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:15:03,007-Speed 11146.73 samples/sec Loss 5.6907 LearningRate 0.0143 Epoch: 24 Global Step: 141420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:15:03,888-Speed 11638.70 samples/sec Loss 5.6445 LearningRate 0.0143 Epoch: 24 Global Step: 141430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:04,835-Speed 10822.31 samples/sec Loss 5.8367 LearningRate 0.0143 Epoch: 24 Global Step: 141440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:05,741-Speed 11310.00 samples/sec Loss 5.7388 LearningRate 0.0143 Epoch: 24 Global Step: 141450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:06,662-Speed 11120.57 samples/sec Loss 5.7067 LearningRate 0.0143 Epoch: 24 Global Step: 141460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:07,606-Speed 10859.04 samples/sec Loss 5.7479 LearningRate 0.0143 Epoch: 24 Global Step: 141470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:08,526-Speed 11131.09 samples/sec Loss 5.6414 LearningRate 0.0143 Epoch: 24 Global Step: 141480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:09,416-Speed 11510.34 samples/sec Loss 5.7187 LearningRate 0.0143 Epoch: 24 Global Step: 141490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:10,297-Speed 11641.75 samples/sec Loss 5.7108 LearningRate 0.0143 Epoch: 24 Global Step: 141500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:11,179-Speed 11615.81 samples/sec Loss 5.5083 LearningRate 0.0143 Epoch: 24 Global Step: 141510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:12,089-Speed 11254.00 samples/sec Loss 5.6505 LearningRate 0.0143 Epoch: 24 Global Step: 141520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:13,010-Speed 11128.52 samples/sec Loss 5.6985 LearningRate 0.0143 Epoch: 24 Global Step: 141530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:13,911-Speed 11381.39 samples/sec Loss 5.8124 LearningRate 0.0143 Epoch: 24 Global Step: 141540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:14,839-Speed 11039.14 samples/sec Loss 5.6163 LearningRate 0.0143 Epoch: 24 Global Step: 141550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:15,749-Speed 11265.14 samples/sec Loss 5.6562 LearningRate 0.0143 Epoch: 24 Global Step: 141560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:16,659-Speed 11257.88 samples/sec Loss 5.5764 LearningRate 0.0143 Epoch: 24 Global Step: 141570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:17,547-Speed 11534.41 samples/sec Loss 5.8602 LearningRate 0.0143 Epoch: 24 Global Step: 141580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:18,471-Speed 11095.24 samples/sec Loss 5.7207 LearningRate 0.0142 Epoch: 24 Global Step: 141590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:19,407-Speed 10948.06 samples/sec Loss 5.6965 LearningRate 0.0142 Epoch: 24 Global Step: 141600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:20,300-Speed 11474.87 samples/sec Loss 5.6134 LearningRate 0.0142 Epoch: 24 Global Step: 141610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:21,186-Speed 11567.90 samples/sec Loss 5.6082 LearningRate 0.0142 Epoch: 24 Global Step: 141620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:22,082-Speed 11436.89 samples/sec Loss 5.5634 LearningRate 0.0142 Epoch: 24 Global Step: 141630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:22,972-Speed 11523.15 samples/sec Loss 5.6724 LearningRate 0.0142 Epoch: 24 Global Step: 141640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:23,882-Speed 11255.92 samples/sec Loss 5.7265 LearningRate 0.0142 Epoch: 24 Global Step: 141650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:24,825-Speed 10861.04 samples/sec Loss 5.5878 LearningRate 0.0142 Epoch: 24 Global Step: 141660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:25,740-Speed 11196.09 samples/sec Loss 5.7311 LearningRate 0.0142 Epoch: 24 Global Step: 141670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:26,681-Speed 10892.79 samples/sec Loss 5.6615 LearningRate 0.0142 Epoch: 24 Global Step: 141680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:27,557-Speed 11701.25 samples/sec Loss 5.7725 LearningRate 0.0142 Epoch: 24 Global Step: 141690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:28,483-Speed 11061.39 samples/sec Loss 5.7490 LearningRate 0.0142 Epoch: 24 Global Step: 141700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:29,375-Speed 11486.25 samples/sec Loss 5.7710 LearningRate 0.0142 Epoch: 24 Global Step: 141710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:30,309-Speed 10980.02 samples/sec Loss 5.6000 LearningRate 0.0142 Epoch: 24 Global Step: 141720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:31,218-Speed 11273.51 samples/sec Loss 5.7063 LearningRate 0.0142 Epoch: 24 Global Step: 141730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:32,121-Speed 11340.43 samples/sec Loss 5.6431 LearningRate 0.0142 Epoch: 24 Global Step: 141740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:33,030-Speed 11281.76 samples/sec Loss 5.6263 LearningRate 0.0142 Epoch: 24 Global Step: 141750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:33,922-Speed 11486.22 samples/sec Loss 5.5914 LearningRate 0.0142 Epoch: 24 Global Step: 141760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:34,822-Speed 11386.16 samples/sec Loss 5.7079 LearningRate 0.0142 Epoch: 24 Global Step: 141770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:35,709-Speed 11545.76 samples/sec Loss 5.6406 LearningRate 0.0142 Epoch: 24 Global Step: 141780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:36,615-Speed 11316.78 samples/sec Loss 5.7250 LearningRate 0.0142 Epoch: 24 Global Step: 141790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:37,537-Speed 11106.03 samples/sec Loss 5.7202 LearningRate 0.0142 Epoch: 24 Global Step: 141800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:38,449-Speed 11248.18 samples/sec Loss 5.7050 LearningRate 0.0142 Epoch: 24 Global Step: 141810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:39,363-Speed 11214.17 samples/sec Loss 5.6778 LearningRate 0.0142 Epoch: 24 Global Step: 141820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:40,256-Speed 11468.96 samples/sec Loss 5.7138 LearningRate 0.0142 Epoch: 24 Global Step: 141830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:41,180-Speed 11097.82 samples/sec Loss 5.8675 LearningRate 0.0142 Epoch: 24 Global Step: 141840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:42,113-Speed 10982.74 samples/sec Loss 5.6442 LearningRate 0.0142 Epoch: 24 Global Step: 141850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:43,053-Speed 10899.94 samples/sec Loss 5.6783 LearningRate 0.0142 Epoch: 24 Global Step: 141860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:43,990-Speed 10933.33 samples/sec Loss 5.7797 LearningRate 0.0142 Epoch: 24 Global Step: 141870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:44,885-Speed 11456.46 samples/sec Loss 5.6865 LearningRate 0.0142 Epoch: 24 Global Step: 141880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:45,791-Speed 11303.00 samples/sec Loss 5.5930 LearningRate 0.0141 Epoch: 24 Global Step: 141890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:46,689-Speed 11410.66 samples/sec Loss 5.5187 LearningRate 0.0141 Epoch: 24 Global Step: 141900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:47,595-Speed 11306.43 samples/sec Loss 5.7200 LearningRate 0.0141 Epoch: 24 Global Step: 141910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:48,497-Speed 11365.26 samples/sec Loss 5.6449 LearningRate 0.0141 Epoch: 24 Global Step: 141920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:49,406-Speed 11274.10 samples/sec Loss 5.7899 LearningRate 0.0141 Epoch: 24 Global Step: 141930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:50,283-Speed 11690.23 samples/sec Loss 5.6542 LearningRate 0.0141 Epoch: 24 Global Step: 141940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:51,197-Speed 11208.92 samples/sec Loss 5.9037 LearningRate 0.0141 Epoch: 24 Global Step: 141950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:52,178-Speed 10444.40 samples/sec Loss 5.6317 LearningRate 0.0141 Epoch: 24 Global Step: 141960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:53,083-Speed 11326.12 samples/sec Loss 5.8070 LearningRate 0.0141 Epoch: 24 Global Step: 141970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:54,075-Speed 10325.34 samples/sec Loss 5.6314 LearningRate 0.0141 Epoch: 24 Global Step: 141980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:15:55,013-Speed 10925.25 samples/sec Loss 5.6627 LearningRate 0.0141 Epoch: 24 Global Step: 141990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:15:55,956-Speed 10861.74 samples/sec Loss 5.7247 LearningRate 0.0141 Epoch: 24 Global Step: 142000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:16:17,095-[lfw][142000]XNorm: 9.277803 Training: 2022-04-11 04:16:17,096-[lfw][142000]Accuracy-Flip: 0.99683+-0.00320 Training: 2022-04-11 04:16:17,096-[lfw][142000]Accuracy-Highest: 0.99717 Training: 2022-04-11 04:16:41,669-[cfp_fp][142000]XNorm: 7.791832 Training: 2022-04-11 04:16:41,670-[cfp_fp][142000]Accuracy-Flip: 0.94671+-0.01155 Training: 2022-04-11 04:16:41,671-[cfp_fp][142000]Accuracy-Highest: 0.95171 Training: 2022-04-11 04:17:02,899-[agedb_30][142000]XNorm: 9.052251 Training: 2022-04-11 04:17:02,900-[agedb_30][142000]Accuracy-Flip: 0.96817+-0.00762 Training: 2022-04-11 04:17:02,900-[agedb_30][142000]Accuracy-Highest: 0.96817 Training: 2022-04-11 04:17:03,855-Speed 150.81 samples/sec Loss 5.6973 LearningRate 0.0141 Epoch: 24 Global Step: 142010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:04,800-Speed 10839.86 samples/sec Loss 5.6297 LearningRate 0.0141 Epoch: 24 Global Step: 142020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:05,705-Speed 11322.78 samples/sec Loss 5.7834 LearningRate 0.0141 Epoch: 24 Global Step: 142030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:06,623-Speed 11167.06 samples/sec Loss 5.7971 LearningRate 0.0141 Epoch: 24 Global Step: 142040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:07,517-Speed 11457.35 samples/sec Loss 5.7158 LearningRate 0.0141 Epoch: 24 Global Step: 142050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:08,429-Speed 11236.18 samples/sec Loss 5.7149 LearningRate 0.0141 Epoch: 24 Global Step: 142060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:09,321-Speed 11484.72 samples/sec Loss 5.6678 LearningRate 0.0141 Epoch: 24 Global Step: 142070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:10,237-Speed 11190.48 samples/sec Loss 5.4394 LearningRate 0.0141 Epoch: 24 Global Step: 142080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:17:11,118-Speed 11632.60 samples/sec Loss 5.7079 LearningRate 0.0141 Epoch: 24 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:12,032-Speed 11201.54 samples/sec Loss 5.6983 LearningRate 0.0141 Epoch: 24 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:12,923-Speed 11500.69 samples/sec Loss 5.6869 LearningRate 0.0141 Epoch: 24 Global Step: 142110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:13,828-Speed 11326.08 samples/sec Loss 5.6424 LearningRate 0.0141 Epoch: 24 Global Step: 142120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:14,709-Speed 11634.89 samples/sec Loss 5.6851 LearningRate 0.0141 Epoch: 24 Global Step: 142130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:15,691-Speed 10429.28 samples/sec Loss 5.6187 LearningRate 0.0141 Epoch: 24 Global Step: 142140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:16,645-Speed 10743.80 samples/sec Loss 5.6325 LearningRate 0.0141 Epoch: 24 Global Step: 142150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:27,393-Speed 952.92 samples/sec Loss 5.0700 LearningRate 0.0141 Epoch: 25 Global Step: 142160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:28,419-Speed 9992.46 samples/sec Loss 5.1599 LearningRate 0.0141 Epoch: 25 Global Step: 142170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:29,378-Speed 10678.71 samples/sec Loss 4.9581 LearningRate 0.0141 Epoch: 25 Global Step: 142180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:30,323-Speed 10850.67 samples/sec Loss 4.9585 LearningRate 0.0140 Epoch: 25 Global Step: 142190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:31,311-Speed 10373.63 samples/sec Loss 5.1091 LearningRate 0.0140 Epoch: 25 Global Step: 142200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:32,256-Speed 10842.91 samples/sec Loss 5.0299 LearningRate 0.0140 Epoch: 25 Global Step: 142210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:33,158-Speed 11358.50 samples/sec Loss 5.0132 LearningRate 0.0140 Epoch: 25 Global Step: 142220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:34,047-Speed 11523.87 samples/sec Loss 4.9685 LearningRate 0.0140 Epoch: 25 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:34,931-Speed 11594.17 samples/sec Loss 5.0052 LearningRate 0.0140 Epoch: 25 Global Step: 142240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:35,846-Speed 11202.79 samples/sec Loss 4.9984 LearningRate 0.0140 Epoch: 25 Global Step: 142250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:36,750-Speed 11333.69 samples/sec Loss 5.0108 LearningRate 0.0140 Epoch: 25 Global Step: 142260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:37,658-Speed 11279.63 samples/sec Loss 5.0963 LearningRate 0.0140 Epoch: 25 Global Step: 142270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:38,595-Speed 10933.99 samples/sec Loss 4.9736 LearningRate 0.0140 Epoch: 25 Global Step: 142280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:39,567-Speed 10548.64 samples/sec Loss 5.0494 LearningRate 0.0140 Epoch: 25 Global Step: 142290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:17:40,478-Speed 11238.16 samples/sec Loss 5.1143 LearningRate 0.0140 Epoch: 25 Global Step: 142300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:41,438-Speed 10678.66 samples/sec Loss 4.9983 LearningRate 0.0140 Epoch: 25 Global Step: 142310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:42,335-Speed 11422.13 samples/sec Loss 4.9918 LearningRate 0.0140 Epoch: 25 Global Step: 142320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:43,267-Speed 10998.97 samples/sec Loss 5.0691 LearningRate 0.0140 Epoch: 25 Global Step: 142330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:44,181-Speed 11214.39 samples/sec Loss 5.0513 LearningRate 0.0140 Epoch: 25 Global Step: 142340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:45,152-Speed 10550.54 samples/sec Loss 5.0767 LearningRate 0.0140 Epoch: 25 Global Step: 142350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:46,046-Speed 11461.05 samples/sec Loss 5.0508 LearningRate 0.0140 Epoch: 25 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:46,977-Speed 11006.36 samples/sec Loss 5.0593 LearningRate 0.0140 Epoch: 25 Global Step: 142370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:47,882-Speed 11321.77 samples/sec Loss 5.1346 LearningRate 0.0140 Epoch: 25 Global Step: 142380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:48,831-Speed 10800.71 samples/sec Loss 5.0228 LearningRate 0.0140 Epoch: 25 Global Step: 142390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:49,849-Speed 10057.91 samples/sec Loss 5.0044 LearningRate 0.0140 Epoch: 25 Global Step: 142400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:17:50,849-Speed 10250.16 samples/sec Loss 5.1658 LearningRate 0.0140 Epoch: 25 Global Step: 142410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:17:51,799-Speed 10783.13 samples/sec Loss 5.0163 LearningRate 0.0140 Epoch: 25 Global Step: 142420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:52,813-Speed 10122.67 samples/sec Loss 5.1751 LearningRate 0.0140 Epoch: 25 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:53,734-Speed 11135.69 samples/sec Loss 4.9879 LearningRate 0.0140 Epoch: 25 Global Step: 142440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:54,657-Speed 11095.35 samples/sec Loss 5.1588 LearningRate 0.0140 Epoch: 25 Global Step: 142450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:55,604-Speed 10830.02 samples/sec Loss 5.0200 LearningRate 0.0140 Epoch: 25 Global Step: 142460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:56,580-Speed 10491.40 samples/sec Loss 5.1361 LearningRate 0.0140 Epoch: 25 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:57,466-Speed 11573.69 samples/sec Loss 5.0927 LearningRate 0.0140 Epoch: 25 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:58,352-Speed 11590.89 samples/sec Loss 5.0224 LearningRate 0.0140 Epoch: 25 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:17:59,290-Speed 10926.92 samples/sec Loss 5.1704 LearningRate 0.0139 Epoch: 25 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:00,243-Speed 10740.65 samples/sec Loss 5.1157 LearningRate 0.0139 Epoch: 25 Global Step: 142510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:01,216-Speed 10534.51 samples/sec Loss 5.1070 LearningRate 0.0139 Epoch: 25 Global Step: 142520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:18:02,149-Speed 10992.30 samples/sec Loss 5.1192 LearningRate 0.0139 Epoch: 25 Global Step: 142530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:03,098-Speed 10797.95 samples/sec Loss 5.2062 LearningRate 0.0139 Epoch: 25 Global Step: 142540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:04,051-Speed 10743.95 samples/sec Loss 5.1580 LearningRate 0.0139 Epoch: 25 Global Step: 142550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:04,991-Speed 10907.71 samples/sec Loss 5.1045 LearningRate 0.0139 Epoch: 25 Global Step: 142560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:05,912-Speed 11115.97 samples/sec Loss 5.2236 LearningRate 0.0139 Epoch: 25 Global Step: 142570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:06,802-Speed 11514.06 samples/sec Loss 5.2219 LearningRate 0.0139 Epoch: 25 Global Step: 142580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:07,813-Speed 10142.99 samples/sec Loss 5.2091 LearningRate 0.0139 Epoch: 25 Global Step: 142590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:08,716-Speed 11346.12 samples/sec Loss 5.1703 LearningRate 0.0139 Epoch: 25 Global Step: 142600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:09,635-Speed 11149.34 samples/sec Loss 5.2230 LearningRate 0.0139 Epoch: 25 Global Step: 142610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:10,545-Speed 11265.09 samples/sec Loss 5.1922 LearningRate 0.0139 Epoch: 25 Global Step: 142620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:11,441-Speed 11439.25 samples/sec Loss 5.1330 LearningRate 0.0139 Epoch: 25 Global Step: 142630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:12,344-Speed 11361.25 samples/sec Loss 5.1729 LearningRate 0.0139 Epoch: 25 Global Step: 142640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:13,246-Speed 11358.08 samples/sec Loss 5.1540 LearningRate 0.0139 Epoch: 25 Global Step: 142650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:14,170-Speed 11096.13 samples/sec Loss 5.0829 LearningRate 0.0139 Epoch: 25 Global Step: 142660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:15,133-Speed 10636.39 samples/sec Loss 5.2045 LearningRate 0.0139 Epoch: 25 Global Step: 142670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:16,072-Speed 10914.33 samples/sec Loss 5.1845 LearningRate 0.0139 Epoch: 25 Global Step: 142680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:16,989-Speed 11174.59 samples/sec Loss 5.1472 LearningRate 0.0139 Epoch: 25 Global Step: 142690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:17,876-Speed 11552.68 samples/sec Loss 5.2596 LearningRate 0.0139 Epoch: 25 Global Step: 142700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:18,788-Speed 11242.92 samples/sec Loss 5.1694 LearningRate 0.0139 Epoch: 25 Global Step: 142710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:19,734-Speed 10830.82 samples/sec Loss 5.1796 LearningRate 0.0139 Epoch: 25 Global Step: 142720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:20,649-Speed 11197.04 samples/sec Loss 5.2750 LearningRate 0.0139 Epoch: 25 Global Step: 142730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:21,558-Speed 11269.98 samples/sec Loss 5.0481 LearningRate 0.0139 Epoch: 25 Global Step: 142740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:22,451-Speed 11488.71 samples/sec Loss 5.2675 LearningRate 0.0139 Epoch: 25 Global Step: 142750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:23,359-Speed 11284.33 samples/sec Loss 5.3250 LearningRate 0.0139 Epoch: 25 Global Step: 142760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:24,275-Speed 11183.46 samples/sec Loss 5.1348 LearningRate 0.0139 Epoch: 25 Global Step: 142770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:25,206-Speed 11002.63 samples/sec Loss 5.2580 LearningRate 0.0139 Epoch: 25 Global Step: 142780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:26,133-Speed 11049.36 samples/sec Loss 5.2130 LearningRate 0.0139 Epoch: 25 Global Step: 142790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:27,032-Speed 11401.03 samples/sec Loss 5.3130 LearningRate 0.0138 Epoch: 25 Global Step: 142800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:27,958-Speed 11064.01 samples/sec Loss 5.3577 LearningRate 0.0138 Epoch: 25 Global Step: 142810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:28,900-Speed 10886.89 samples/sec Loss 5.1820 LearningRate 0.0138 Epoch: 25 Global Step: 142820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:29,852-Speed 10760.57 samples/sec Loss 5.2878 LearningRate 0.0138 Epoch: 25 Global Step: 142830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:30,749-Speed 11424.01 samples/sec Loss 5.1870 LearningRate 0.0138 Epoch: 25 Global Step: 142840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:31,717-Speed 10580.08 samples/sec Loss 5.2042 LearningRate 0.0138 Epoch: 25 Global Step: 142850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:32,634-Speed 11182.63 samples/sec Loss 5.2335 LearningRate 0.0138 Epoch: 25 Global Step: 142860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:33,521-Speed 11560.03 samples/sec Loss 5.1958 LearningRate 0.0138 Epoch: 25 Global Step: 142870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:34,406-Speed 11569.95 samples/sec Loss 5.2690 LearningRate 0.0138 Epoch: 25 Global Step: 142880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:35,298-Speed 11489.97 samples/sec Loss 5.1744 LearningRate 0.0138 Epoch: 25 Global Step: 142890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:36,236-Speed 10926.57 samples/sec Loss 5.2262 LearningRate 0.0138 Epoch: 25 Global Step: 142900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:37,163-Speed 11055.44 samples/sec Loss 5.2512 LearningRate 0.0138 Epoch: 25 Global Step: 142910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:38,071-Speed 11293.34 samples/sec Loss 5.2910 LearningRate 0.0138 Epoch: 25 Global Step: 142920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:39,046-Speed 10504.05 samples/sec Loss 5.2746 LearningRate 0.0138 Epoch: 25 Global Step: 142930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:39,943-Speed 11417.01 samples/sec Loss 5.2001 LearningRate 0.0138 Epoch: 25 Global Step: 142940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:40,848-Speed 11323.27 samples/sec Loss 5.2564 LearningRate 0.0138 Epoch: 25 Global Step: 142950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:41,764-Speed 11191.75 samples/sec Loss 5.2465 LearningRate 0.0138 Epoch: 25 Global Step: 142960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:42,700-Speed 10950.43 samples/sec Loss 5.5501 LearningRate 0.0138 Epoch: 25 Global Step: 142970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:43,640-Speed 10895.59 samples/sec Loss 5.1167 LearningRate 0.0138 Epoch: 25 Global Step: 142980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:44,545-Speed 11329.00 samples/sec Loss 5.2909 LearningRate 0.0138 Epoch: 25 Global Step: 142990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:45,456-Speed 11239.24 samples/sec Loss 5.1445 LearningRate 0.0138 Epoch: 25 Global Step: 143000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:46,345-Speed 11530.69 samples/sec Loss 5.2389 LearningRate 0.0138 Epoch: 25 Global Step: 143010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:47,263-Speed 11167.03 samples/sec Loss 5.0668 LearningRate 0.0138 Epoch: 25 Global Step: 143020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:48,179-Speed 11179.50 samples/sec Loss 5.2600 LearningRate 0.0138 Epoch: 25 Global Step: 143030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:49,105-Speed 11076.00 samples/sec Loss 5.3858 LearningRate 0.0138 Epoch: 25 Global Step: 143040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:49,997-Speed 11483.07 samples/sec Loss 5.3172 LearningRate 0.0138 Epoch: 25 Global Step: 143050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:50,936-Speed 10920.89 samples/sec Loss 5.2992 LearningRate 0.0138 Epoch: 25 Global Step: 143060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:51,849-Speed 11214.74 samples/sec Loss 5.1152 LearningRate 0.0138 Epoch: 25 Global Step: 143070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:52,746-Speed 11437.42 samples/sec Loss 5.3777 LearningRate 0.0138 Epoch: 25 Global Step: 143080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:53,674-Speed 11040.96 samples/sec Loss 5.3052 LearningRate 0.0138 Epoch: 25 Global Step: 143090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:54,596-Speed 11114.44 samples/sec Loss 5.2871 LearningRate 0.0138 Epoch: 25 Global Step: 143100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:55,499-Speed 11339.99 samples/sec Loss 5.3865 LearningRate 0.0137 Epoch: 25 Global Step: 143110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:18:56,406-Speed 11302.66 samples/sec Loss 5.3434 LearningRate 0.0137 Epoch: 25 Global Step: 143120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:18:57,334-Speed 11045.62 samples/sec Loss 5.1736 LearningRate 0.0137 Epoch: 25 Global Step: 143130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:18:58,258-Speed 11083.37 samples/sec Loss 5.2646 LearningRate 0.0137 Epoch: 25 Global Step: 143140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:18:59,195-Speed 10939.37 samples/sec Loss 5.2145 LearningRate 0.0137 Epoch: 25 Global Step: 143150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:00,176-Speed 10444.19 samples/sec Loss 5.2411 LearningRate 0.0137 Epoch: 25 Global Step: 143160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:01,117-Speed 10883.56 samples/sec Loss 5.3228 LearningRate 0.0137 Epoch: 25 Global Step: 143170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:02,034-Speed 11181.74 samples/sec Loss 5.3182 LearningRate 0.0137 Epoch: 25 Global Step: 143180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:02,973-Speed 10909.58 samples/sec Loss 5.2738 LearningRate 0.0137 Epoch: 25 Global Step: 143190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:03,883-Speed 11268.69 samples/sec Loss 5.1822 LearningRate 0.0137 Epoch: 25 Global Step: 143200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:04,779-Speed 11435.66 samples/sec Loss 5.4481 LearningRate 0.0137 Epoch: 25 Global Step: 143210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:05,732-Speed 10746.87 samples/sec Loss 5.2120 LearningRate 0.0137 Epoch: 25 Global Step: 143220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:06,668-Speed 10945.22 samples/sec Loss 5.4233 LearningRate 0.0137 Epoch: 25 Global Step: 143230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:07,591-Speed 11113.42 samples/sec Loss 5.3195 LearningRate 0.0137 Epoch: 25 Global Step: 143240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:08,515-Speed 11085.79 samples/sec Loss 5.3748 LearningRate 0.0137 Epoch: 25 Global Step: 143250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:09,405-Speed 11510.66 samples/sec Loss 5.2319 LearningRate 0.0137 Epoch: 25 Global Step: 143260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:10,325-Speed 11143.65 samples/sec Loss 5.2984 LearningRate 0.0137 Epoch: 25 Global Step: 143270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:11,220-Speed 11438.53 samples/sec Loss 5.2343 LearningRate 0.0137 Epoch: 25 Global Step: 143280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:12,119-Speed 11414.59 samples/sec Loss 5.2296 LearningRate 0.0137 Epoch: 25 Global Step: 143290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:13,023-Speed 11330.76 samples/sec Loss 5.2779 LearningRate 0.0137 Epoch: 25 Global Step: 143300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:13,908-Speed 11578.05 samples/sec Loss 5.2828 LearningRate 0.0137 Epoch: 25 Global Step: 143310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:14,840-Speed 10998.90 samples/sec Loss 5.2812 LearningRate 0.0137 Epoch: 25 Global Step: 143320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:15,783-Speed 10867.73 samples/sec Loss 5.3681 LearningRate 0.0137 Epoch: 25 Global Step: 143330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:16,705-Speed 11111.87 samples/sec Loss 5.2675 LearningRate 0.0137 Epoch: 25 Global Step: 143340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:19:17,635-Speed 11036.24 samples/sec Loss 5.2883 LearningRate 0.0137 Epoch: 25 Global Step: 143350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:18,527-Speed 11485.80 samples/sec Loss 5.3958 LearningRate 0.0137 Epoch: 25 Global Step: 143360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:19,450-Speed 11094.63 samples/sec Loss 5.3435 LearningRate 0.0137 Epoch: 25 Global Step: 143370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:20,360-Speed 11268.14 samples/sec Loss 5.2745 LearningRate 0.0137 Epoch: 25 Global Step: 143380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:21,265-Speed 11312.15 samples/sec Loss 5.3498 LearningRate 0.0137 Epoch: 25 Global Step: 143390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:22,156-Speed 11513.18 samples/sec Loss 5.3102 LearningRate 0.0137 Epoch: 25 Global Step: 143400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:23,032-Speed 11688.27 samples/sec Loss 5.3338 LearningRate 0.0137 Epoch: 25 Global Step: 143410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:23,949-Speed 11175.63 samples/sec Loss 5.3893 LearningRate 0.0136 Epoch: 25 Global Step: 143420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:24,854-Speed 11324.98 samples/sec Loss 5.3214 LearningRate 0.0136 Epoch: 25 Global Step: 143430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:25,759-Speed 11327.05 samples/sec Loss 5.3909 LearningRate 0.0136 Epoch: 25 Global Step: 143440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:26,663-Speed 11330.59 samples/sec Loss 5.2786 LearningRate 0.0136 Epoch: 25 Global Step: 143450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:27,579-Speed 11188.90 samples/sec Loss 5.3669 LearningRate 0.0136 Epoch: 25 Global Step: 143460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:28,518-Speed 10909.88 samples/sec Loss 5.3430 LearningRate 0.0136 Epoch: 25 Global Step: 143470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:29,475-Speed 10715.13 samples/sec Loss 5.3162 LearningRate 0.0136 Epoch: 25 Global Step: 143480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:30,467-Speed 10335.25 samples/sec Loss 5.3990 LearningRate 0.0136 Epoch: 25 Global Step: 143490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:31,384-Speed 11176.45 samples/sec Loss 5.5327 LearningRate 0.0136 Epoch: 25 Global Step: 143500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:32,303-Speed 11143.38 samples/sec Loss 5.4277 LearningRate 0.0136 Epoch: 25 Global Step: 143510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:33,185-Speed 11623.85 samples/sec Loss 5.3525 LearningRate 0.0136 Epoch: 25 Global Step: 143520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:34,081-Speed 11433.54 samples/sec Loss 5.4397 LearningRate 0.0136 Epoch: 25 Global Step: 143530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:34,973-Speed 11489.23 samples/sec Loss 5.3046 LearningRate 0.0136 Epoch: 25 Global Step: 143540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:35,879-Speed 11308.61 samples/sec Loss 5.2853 LearningRate 0.0136 Epoch: 25 Global Step: 143550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:36,802-Speed 11098.79 samples/sec Loss 5.4255 LearningRate 0.0136 Epoch: 25 Global Step: 143560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:37,729-Speed 11061.85 samples/sec Loss 5.3929 LearningRate 0.0136 Epoch: 25 Global Step: 143570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:38,636-Speed 11293.28 samples/sec Loss 5.3872 LearningRate 0.0136 Epoch: 25 Global Step: 143580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:39,548-Speed 11236.93 samples/sec Loss 5.4298 LearningRate 0.0136 Epoch: 25 Global Step: 143590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:40,470-Speed 11115.20 samples/sec Loss 5.3438 LearningRate 0.0136 Epoch: 25 Global Step: 143600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:41,412-Speed 10878.51 samples/sec Loss 5.4521 LearningRate 0.0136 Epoch: 25 Global Step: 143610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:42,349-Speed 10935.72 samples/sec Loss 5.4411 LearningRate 0.0136 Epoch: 25 Global Step: 143620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:19:43,286-Speed 10943.22 samples/sec Loss 5.4029 LearningRate 0.0136 Epoch: 25 Global Step: 143630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:19:44,224-Speed 10921.47 samples/sec Loss 5.3783 LearningRate 0.0136 Epoch: 25 Global Step: 143640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:45,113-Speed 11523.44 samples/sec Loss 5.4075 LearningRate 0.0136 Epoch: 25 Global Step: 143650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:46,011-Speed 11412.66 samples/sec Loss 5.3106 LearningRate 0.0136 Epoch: 25 Global Step: 143660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:46,917-Speed 11311.21 samples/sec Loss 5.3168 LearningRate 0.0136 Epoch: 25 Global Step: 143670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:47,845-Speed 11038.46 samples/sec Loss 5.3893 LearningRate 0.0136 Epoch: 25 Global Step: 143680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:48,808-Speed 10645.81 samples/sec Loss 5.3191 LearningRate 0.0136 Epoch: 25 Global Step: 143690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:49,732-Speed 11087.51 samples/sec Loss 5.3602 LearningRate 0.0136 Epoch: 25 Global Step: 143700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:50,701-Speed 10573.68 samples/sec Loss 5.5045 LearningRate 0.0136 Epoch: 25 Global Step: 143710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:51,626-Speed 11086.53 samples/sec Loss 5.4069 LearningRate 0.0135 Epoch: 25 Global Step: 143720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:52,604-Speed 10479.73 samples/sec Loss 5.4092 LearningRate 0.0135 Epoch: 25 Global Step: 143730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:53,539-Speed 10959.86 samples/sec Loss 5.3763 LearningRate 0.0135 Epoch: 25 Global Step: 143740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:54,485-Speed 10825.09 samples/sec Loss 5.4646 LearningRate 0.0135 Epoch: 25 Global Step: 143750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:55,398-Speed 11231.16 samples/sec Loss 5.4055 LearningRate 0.0135 Epoch: 25 Global Step: 143760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:56,318-Speed 11131.64 samples/sec Loss 5.4295 LearningRate 0.0135 Epoch: 25 Global Step: 143770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:19:57,221-Speed 11355.11 samples/sec Loss 5.4125 LearningRate 0.0135 Epoch: 25 Global Step: 143780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:58,121-Speed 11385.81 samples/sec Loss 5.4358 LearningRate 0.0135 Epoch: 25 Global Step: 143790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:59,010-Speed 11525.97 samples/sec Loss 5.3828 LearningRate 0.0135 Epoch: 25 Global Step: 143800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:19:59,936-Speed 11061.02 samples/sec Loss 5.3762 LearningRate 0.0135 Epoch: 25 Global Step: 143810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:00,836-Speed 11392.93 samples/sec Loss 5.5321 LearningRate 0.0135 Epoch: 25 Global Step: 143820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:01,745-Speed 11264.62 samples/sec Loss 5.4914 LearningRate 0.0135 Epoch: 25 Global Step: 143830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:02,668-Speed 11114.77 samples/sec Loss 5.3985 LearningRate 0.0135 Epoch: 25 Global Step: 143840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:03,564-Speed 11437.08 samples/sec Loss 5.5584 LearningRate 0.0135 Epoch: 25 Global Step: 143850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:04,510-Speed 10833.92 samples/sec Loss 5.1872 LearningRate 0.0135 Epoch: 25 Global Step: 143860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:05,439-Speed 11029.28 samples/sec Loss 5.3985 LearningRate 0.0135 Epoch: 25 Global Step: 143870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:06,384-Speed 10841.15 samples/sec Loss 5.4418 LearningRate 0.0135 Epoch: 25 Global Step: 143880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:20:07,266-Speed 11613.59 samples/sec Loss 5.4886 LearningRate 0.0135 Epoch: 25 Global Step: 143890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:20:08,164-Speed 11409.05 samples/sec Loss 5.3119 LearningRate 0.0135 Epoch: 25 Global Step: 143900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:09,073-Speed 11279.73 samples/sec Loss 5.4014 LearningRate 0.0135 Epoch: 25 Global Step: 143910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:09,953-Speed 11641.02 samples/sec Loss 5.3267 LearningRate 0.0135 Epoch: 25 Global Step: 143920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:10,833-Speed 11644.96 samples/sec Loss 5.5109 LearningRate 0.0135 Epoch: 25 Global Step: 143930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:11,744-Speed 11248.14 samples/sec Loss 5.3524 LearningRate 0.0135 Epoch: 25 Global Step: 143940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:12,664-Speed 11140.24 samples/sec Loss 5.4270 LearningRate 0.0135 Epoch: 25 Global Step: 143950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:13,599-Speed 10966.23 samples/sec Loss 5.4871 LearningRate 0.0135 Epoch: 25 Global Step: 143960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:14,502-Speed 11346.61 samples/sec Loss 5.3401 LearningRate 0.0135 Epoch: 25 Global Step: 143970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:15,421-Speed 11153.58 samples/sec Loss 5.2309 LearningRate 0.0135 Epoch: 25 Global Step: 143980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:16,339-Speed 11152.92 samples/sec Loss 5.4590 LearningRate 0.0135 Epoch: 25 Global Step: 143990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:17,268-Speed 11031.31 samples/sec Loss 5.4378 LearningRate 0.0135 Epoch: 25 Global Step: 144000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:20:38,733-[lfw][144000]XNorm: 9.139620 Training: 2022-04-11 04:20:38,734-[lfw][144000]Accuracy-Flip: 0.99633+-0.00332 Training: 2022-04-11 04:20:38,734-[lfw][144000]Accuracy-Highest: 0.99717 Training: 2022-04-11 04:21:03,328-[cfp_fp][144000]XNorm: 7.758474 Training: 2022-04-11 04:21:03,329-[cfp_fp][144000]Accuracy-Flip: 0.95029+-0.01049 Training: 2022-04-11 04:21:03,329-[cfp_fp][144000]Accuracy-Highest: 0.95171 Training: 2022-04-11 04:21:24,548-[agedb_30][144000]XNorm: 8.937793 Training: 2022-04-11 04:21:24,549-[agedb_30][144000]Accuracy-Flip: 0.96683+-0.00732 Training: 2022-04-11 04:21:24,549-[agedb_30][144000]Accuracy-Highest: 0.96817 Training: 2022-04-11 04:21:25,443-Speed 150.20 samples/sec Loss 5.3823 LearningRate 0.0135 Epoch: 25 Global Step: 144010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:26,383-Speed 10896.78 samples/sec Loss 5.3750 LearningRate 0.0135 Epoch: 25 Global Step: 144020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:27,311-Speed 11048.34 samples/sec Loss 5.4418 LearningRate 0.0134 Epoch: 25 Global Step: 144030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:28,224-Speed 11216.86 samples/sec Loss 5.5282 LearningRate 0.0134 Epoch: 25 Global Step: 144040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:29,097-Speed 11737.37 samples/sec Loss 5.6180 LearningRate 0.0134 Epoch: 25 Global Step: 144050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:30,004-Speed 11292.67 samples/sec Loss 5.3758 LearningRate 0.0134 Epoch: 25 Global Step: 144060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:30,896-Speed 11494.15 samples/sec Loss 5.3416 LearningRate 0.0134 Epoch: 25 Global Step: 144070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:31,836-Speed 10906.17 samples/sec Loss 5.3557 LearningRate 0.0134 Epoch: 25 Global Step: 144080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:32,748-Speed 11232.50 samples/sec Loss 5.4960 LearningRate 0.0134 Epoch: 25 Global Step: 144090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:33,625-Speed 11685.89 samples/sec Loss 5.4685 LearningRate 0.0134 Epoch: 25 Global Step: 144100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:34,542-Speed 11174.04 samples/sec Loss 5.4196 LearningRate 0.0134 Epoch: 25 Global Step: 144110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:35,424-Speed 11616.41 samples/sec Loss 5.5739 LearningRate 0.0134 Epoch: 25 Global Step: 144120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:36,351-Speed 11056.91 samples/sec Loss 5.5206 LearningRate 0.0134 Epoch: 25 Global Step: 144130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:37,310-Speed 10685.45 samples/sec Loss 5.4359 LearningRate 0.0134 Epoch: 25 Global Step: 144140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:21:38,251-Speed 10885.16 samples/sec Loss 5.6328 LearningRate 0.0134 Epoch: 25 Global Step: 144150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:39,172-Speed 11124.67 samples/sec Loss 5.3526 LearningRate 0.0134 Epoch: 25 Global Step: 144160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:40,098-Speed 11069.81 samples/sec Loss 5.4003 LearningRate 0.0134 Epoch: 25 Global Step: 144170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:40,985-Speed 11547.94 samples/sec Loss 5.3779 LearningRate 0.0134 Epoch: 25 Global Step: 144180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:41,904-Speed 11155.32 samples/sec Loss 5.4058 LearningRate 0.0134 Epoch: 25 Global Step: 144190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:42,847-Speed 10867.17 samples/sec Loss 5.4584 LearningRate 0.0134 Epoch: 25 Global Step: 144200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:43,762-Speed 11204.38 samples/sec Loss 5.4794 LearningRate 0.0134 Epoch: 25 Global Step: 144210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:44,666-Speed 11328.50 samples/sec Loss 5.6245 LearningRate 0.0134 Epoch: 25 Global Step: 144220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:45,584-Speed 11162.22 samples/sec Loss 5.5742 LearningRate 0.0134 Epoch: 25 Global Step: 144230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:46,474-Speed 11511.10 samples/sec Loss 5.4685 LearningRate 0.0134 Epoch: 25 Global Step: 144240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:47,393-Speed 11156.81 samples/sec Loss 5.3818 LearningRate 0.0134 Epoch: 25 Global Step: 144250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:48,338-Speed 10842.61 samples/sec Loss 5.4998 LearningRate 0.0134 Epoch: 25 Global Step: 144260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:49,259-Speed 11130.16 samples/sec Loss 5.4283 LearningRate 0.0134 Epoch: 25 Global Step: 144270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:50,180-Speed 11121.45 samples/sec Loss 5.3920 LearningRate 0.0134 Epoch: 25 Global Step: 144280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:51,088-Speed 11287.20 samples/sec Loss 5.5060 LearningRate 0.0134 Epoch: 25 Global Step: 144290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:52,000-Speed 11236.37 samples/sec Loss 5.4874 LearningRate 0.0134 Epoch: 25 Global Step: 144300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:52,939-Speed 10905.03 samples/sec Loss 5.5556 LearningRate 0.0134 Epoch: 25 Global Step: 144310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:53,869-Speed 11026.70 samples/sec Loss 5.3121 LearningRate 0.0134 Epoch: 25 Global Step: 144320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:54,812-Speed 10867.03 samples/sec Loss 5.3936 LearningRate 0.0134 Epoch: 25 Global Step: 144330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:55,733-Speed 11134.90 samples/sec Loss 5.4418 LearningRate 0.0133 Epoch: 25 Global Step: 144340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:56,635-Speed 11356.71 samples/sec Loss 5.5080 LearningRate 0.0133 Epoch: 25 Global Step: 144350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:21:57,520-Speed 11580.64 samples/sec Loss 5.3471 LearningRate 0.0133 Epoch: 25 Global Step: 144360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:58,417-Speed 11419.41 samples/sec Loss 5.5459 LearningRate 0.0133 Epoch: 25 Global Step: 144370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:21:59,318-Speed 11378.63 samples/sec Loss 5.4520 LearningRate 0.0133 Epoch: 25 Global Step: 144380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:00,209-Speed 11505.39 samples/sec Loss 5.3489 LearningRate 0.0133 Epoch: 25 Global Step: 144390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:01,099-Speed 11505.97 samples/sec Loss 5.3936 LearningRate 0.0133 Epoch: 25 Global Step: 144400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:02,027-Speed 11043.70 samples/sec Loss 5.5678 LearningRate 0.0133 Epoch: 25 Global Step: 144410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:02,934-Speed 11305.70 samples/sec Loss 5.4087 LearningRate 0.0133 Epoch: 25 Global Step: 144420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:03,888-Speed 10739.76 samples/sec Loss 5.4626 LearningRate 0.0133 Epoch: 25 Global Step: 144430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:04,846-Speed 10694.96 samples/sec Loss 5.4425 LearningRate 0.0133 Epoch: 25 Global Step: 144440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:05,773-Speed 11047.87 samples/sec Loss 5.4424 LearningRate 0.0133 Epoch: 25 Global Step: 144450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:06,700-Speed 11055.82 samples/sec Loss 5.3850 LearningRate 0.0133 Epoch: 25 Global Step: 144460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:07,607-Speed 11299.78 samples/sec Loss 5.4460 LearningRate 0.0133 Epoch: 25 Global Step: 144470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:08,551-Speed 10855.70 samples/sec Loss 5.3636 LearningRate 0.0133 Epoch: 25 Global Step: 144480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:09,463-Speed 11246.94 samples/sec Loss 5.5312 LearningRate 0.0133 Epoch: 25 Global Step: 144490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:10,332-Speed 11803.54 samples/sec Loss 5.4645 LearningRate 0.0133 Epoch: 25 Global Step: 144500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:11,240-Speed 11282.84 samples/sec Loss 5.4950 LearningRate 0.0133 Epoch: 25 Global Step: 144510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:12,162-Speed 11115.51 samples/sec Loss 5.3761 LearningRate 0.0133 Epoch: 25 Global Step: 144520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:13,128-Speed 10606.07 samples/sec Loss 5.4157 LearningRate 0.0133 Epoch: 25 Global Step: 144530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:14,051-Speed 11096.53 samples/sec Loss 5.4354 LearningRate 0.0133 Epoch: 25 Global Step: 144540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:14,970-Speed 11151.56 samples/sec Loss 5.4338 LearningRate 0.0133 Epoch: 25 Global Step: 144550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:15,866-Speed 11446.94 samples/sec Loss 5.4322 LearningRate 0.0133 Epoch: 25 Global Step: 144560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:16,791-Speed 11080.54 samples/sec Loss 5.3543 LearningRate 0.0133 Epoch: 25 Global Step: 144570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:17,665-Speed 11716.75 samples/sec Loss 5.5078 LearningRate 0.0133 Epoch: 25 Global Step: 144580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:18,561-Speed 11439.69 samples/sec Loss 5.3983 LearningRate 0.0133 Epoch: 25 Global Step: 144590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:19,488-Speed 11056.88 samples/sec Loss 5.5245 LearningRate 0.0133 Epoch: 25 Global Step: 144600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:20,374-Speed 11558.83 samples/sec Loss 5.4590 LearningRate 0.0133 Epoch: 25 Global Step: 144610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:21,264-Speed 11521.69 samples/sec Loss 5.3900 LearningRate 0.0133 Epoch: 25 Global Step: 144620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:22,155-Speed 11503.24 samples/sec Loss 5.5486 LearningRate 0.0133 Epoch: 25 Global Step: 144630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:23,066-Speed 11245.58 samples/sec Loss 5.5689 LearningRate 0.0133 Epoch: 25 Global Step: 144640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:23,972-Speed 11307.04 samples/sec Loss 5.3783 LearningRate 0.0133 Epoch: 25 Global Step: 144650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:24,904-Speed 11001.37 samples/sec Loss 5.5489 LearningRate 0.0132 Epoch: 25 Global Step: 144660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:25,807-Speed 11339.03 samples/sec Loss 5.6495 LearningRate 0.0132 Epoch: 25 Global Step: 144670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:26,693-Speed 11563.47 samples/sec Loss 5.4426 LearningRate 0.0132 Epoch: 25 Global Step: 144680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:27,578-Speed 11586.51 samples/sec Loss 5.6082 LearningRate 0.0132 Epoch: 25 Global Step: 144690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:28,485-Speed 11290.98 samples/sec Loss 5.4958 LearningRate 0.0132 Epoch: 25 Global Step: 144700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:29,404-Speed 11161.25 samples/sec Loss 5.5127 LearningRate 0.0132 Epoch: 25 Global Step: 144710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:30,321-Speed 11164.89 samples/sec Loss 5.4566 LearningRate 0.0132 Epoch: 25 Global Step: 144720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:31,252-Speed 11009.83 samples/sec Loss 5.5164 LearningRate 0.0132 Epoch: 25 Global Step: 144730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:32,173-Speed 11122.01 samples/sec Loss 5.4933 LearningRate 0.0132 Epoch: 25 Global Step: 144740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:33,097-Speed 11094.92 samples/sec Loss 5.4442 LearningRate 0.0132 Epoch: 25 Global Step: 144750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:33,993-Speed 11441.72 samples/sec Loss 5.4431 LearningRate 0.0132 Epoch: 25 Global Step: 144760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:22:34,883-Speed 11508.16 samples/sec Loss 5.5232 LearningRate 0.0132 Epoch: 25 Global Step: 144770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:22:35,789-Speed 11315.51 samples/sec Loss 5.4775 LearningRate 0.0132 Epoch: 25 Global Step: 144780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:22:36,725-Speed 10940.99 samples/sec Loss 5.4425 LearningRate 0.0132 Epoch: 25 Global Step: 144790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:37,668-Speed 10865.47 samples/sec Loss 5.5540 LearningRate 0.0132 Epoch: 25 Global Step: 144800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:38,599-Speed 11020.11 samples/sec Loss 5.5453 LearningRate 0.0132 Epoch: 25 Global Step: 144810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:39,538-Speed 10912.80 samples/sec Loss 5.2825 LearningRate 0.0132 Epoch: 25 Global Step: 144820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:40,458-Speed 11135.78 samples/sec Loss 5.4671 LearningRate 0.0132 Epoch: 25 Global Step: 144830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:41,365-Speed 11303.85 samples/sec Loss 5.2816 LearningRate 0.0132 Epoch: 25 Global Step: 144840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:42,267-Speed 11361.76 samples/sec Loss 5.3934 LearningRate 0.0132 Epoch: 25 Global Step: 144850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:43,199-Speed 10992.01 samples/sec Loss 5.4137 LearningRate 0.0132 Epoch: 25 Global Step: 144860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:44,150-Speed 10772.19 samples/sec Loss 5.5193 LearningRate 0.0132 Epoch: 25 Global Step: 144870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:45,031-Speed 11637.98 samples/sec Loss 5.6324 LearningRate 0.0132 Epoch: 25 Global Step: 144880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:45,925-Speed 11457.85 samples/sec Loss 5.4420 LearningRate 0.0132 Epoch: 25 Global Step: 144890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:46,846-Speed 11126.56 samples/sec Loss 5.4943 LearningRate 0.0132 Epoch: 25 Global Step: 144900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:47,747-Speed 11372.61 samples/sec Loss 5.4616 LearningRate 0.0132 Epoch: 25 Global Step: 144910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:48,662-Speed 11201.23 samples/sec Loss 5.3751 LearningRate 0.0132 Epoch: 25 Global Step: 144920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:49,609-Speed 10814.97 samples/sec Loss 5.5907 LearningRate 0.0132 Epoch: 25 Global Step: 144930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:50,553-Speed 10856.26 samples/sec Loss 5.4949 LearningRate 0.0132 Epoch: 25 Global Step: 144940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:51,496-Speed 10873.82 samples/sec Loss 5.5421 LearningRate 0.0132 Epoch: 25 Global Step: 144950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:52,432-Speed 10937.72 samples/sec Loss 5.5063 LearningRate 0.0132 Epoch: 25 Global Step: 144960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:53,402-Speed 10565.23 samples/sec Loss 5.7177 LearningRate 0.0131 Epoch: 25 Global Step: 144970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:22:54,380-Speed 10482.68 samples/sec Loss 5.4961 LearningRate 0.0131 Epoch: 25 Global Step: 144980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:55,293-Speed 11220.74 samples/sec Loss 5.6023 LearningRate 0.0131 Epoch: 25 Global Step: 144990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:56,181-Speed 11540.50 samples/sec Loss 5.4358 LearningRate 0.0131 Epoch: 25 Global Step: 145000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:57,133-Speed 10762.81 samples/sec Loss 5.4427 LearningRate 0.0131 Epoch: 25 Global Step: 145010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:58,051-Speed 11166.12 samples/sec Loss 5.5231 LearningRate 0.0131 Epoch: 25 Global Step: 145020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:58,972-Speed 11135.33 samples/sec Loss 5.4638 LearningRate 0.0131 Epoch: 25 Global Step: 145030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:22:59,899-Speed 11046.95 samples/sec Loss 5.5199 LearningRate 0.0131 Epoch: 25 Global Step: 145040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:00,819-Speed 11143.38 samples/sec Loss 5.4286 LearningRate 0.0131 Epoch: 25 Global Step: 145050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:01,763-Speed 10852.59 samples/sec Loss 5.5044 LearningRate 0.0131 Epoch: 25 Global Step: 145060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:02,672-Speed 11274.64 samples/sec Loss 5.5268 LearningRate 0.0131 Epoch: 25 Global Step: 145070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:03,617-Speed 10838.61 samples/sec Loss 5.6359 LearningRate 0.0131 Epoch: 25 Global Step: 145080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:04,504-Speed 11554.84 samples/sec Loss 5.4233 LearningRate 0.0131 Epoch: 25 Global Step: 145090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:05,457-Speed 10757.95 samples/sec Loss 5.5488 LearningRate 0.0131 Epoch: 25 Global Step: 145100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:06,396-Speed 10912.76 samples/sec Loss 5.4491 LearningRate 0.0131 Epoch: 25 Global Step: 145110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:07,344-Speed 10804.92 samples/sec Loss 5.4804 LearningRate 0.0131 Epoch: 25 Global Step: 145120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:08,242-Speed 11414.75 samples/sec Loss 5.5339 LearningRate 0.0131 Epoch: 25 Global Step: 145130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:09,152-Speed 11261.85 samples/sec Loss 5.4853 LearningRate 0.0131 Epoch: 25 Global Step: 145140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:10,049-Speed 11418.06 samples/sec Loss 5.3781 LearningRate 0.0131 Epoch: 25 Global Step: 145150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:10,930-Speed 11638.74 samples/sec Loss 5.6131 LearningRate 0.0131 Epoch: 25 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:11,843-Speed 11216.95 samples/sec Loss 5.4361 LearningRate 0.0131 Epoch: 25 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:12,782-Speed 10920.19 samples/sec Loss 5.4935 LearningRate 0.0131 Epoch: 25 Global Step: 145180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:13,683-Speed 11375.57 samples/sec Loss 5.4871 LearningRate 0.0131 Epoch: 25 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:14,601-Speed 11154.69 samples/sec Loss 5.7088 LearningRate 0.0131 Epoch: 25 Global Step: 145200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:15,530-Speed 11035.38 samples/sec Loss 5.4812 LearningRate 0.0131 Epoch: 25 Global Step: 145210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:16,449-Speed 11144.42 samples/sec Loss 5.3241 LearningRate 0.0131 Epoch: 25 Global Step: 145220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:17,394-Speed 10846.74 samples/sec Loss 5.4344 LearningRate 0.0131 Epoch: 25 Global Step: 145230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:18,330-Speed 10947.62 samples/sec Loss 5.5308 LearningRate 0.0131 Epoch: 25 Global Step: 145240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:19,247-Speed 11176.35 samples/sec Loss 5.4884 LearningRate 0.0131 Epoch: 25 Global Step: 145250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:20,166-Speed 11146.55 samples/sec Loss 5.4243 LearningRate 0.0131 Epoch: 25 Global Step: 145260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:21,053-Speed 11552.96 samples/sec Loss 5.6634 LearningRate 0.0131 Epoch: 25 Global Step: 145270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:21,937-Speed 11592.62 samples/sec Loss 5.5382 LearningRate 0.0130 Epoch: 25 Global Step: 145280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:22,813-Speed 11697.91 samples/sec Loss 5.5408 LearningRate 0.0130 Epoch: 25 Global Step: 145290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:23,739-Speed 11067.10 samples/sec Loss 5.5660 LearningRate 0.0130 Epoch: 25 Global Step: 145300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:24,639-Speed 11385.22 samples/sec Loss 5.5484 LearningRate 0.0130 Epoch: 25 Global Step: 145310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:25,537-Speed 11416.13 samples/sec Loss 5.5787 LearningRate 0.0130 Epoch: 25 Global Step: 145320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:26,449-Speed 11236.43 samples/sec Loss 5.6669 LearningRate 0.0130 Epoch: 25 Global Step: 145330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:27,351-Speed 11359.40 samples/sec Loss 5.5007 LearningRate 0.0130 Epoch: 25 Global Step: 145340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:28,303-Speed 10758.53 samples/sec Loss 5.5292 LearningRate 0.0130 Epoch: 25 Global Step: 145350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:29,278-Speed 10511.71 samples/sec Loss 5.6762 LearningRate 0.0130 Epoch: 25 Global Step: 145360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:30,206-Speed 11039.01 samples/sec Loss 5.4429 LearningRate 0.0130 Epoch: 25 Global Step: 145370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:31,148-Speed 10877.69 samples/sec Loss 5.4810 LearningRate 0.0130 Epoch: 25 Global Step: 145380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:32,053-Speed 11321.54 samples/sec Loss 5.4797 LearningRate 0.0130 Epoch: 25 Global Step: 145390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:32,953-Speed 11392.77 samples/sec Loss 5.5347 LearningRate 0.0130 Epoch: 25 Global Step: 145400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:33,846-Speed 11470.80 samples/sec Loss 5.4959 LearningRate 0.0130 Epoch: 25 Global Step: 145410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:34,725-Speed 11661.68 samples/sec Loss 5.5719 LearningRate 0.0130 Epoch: 25 Global Step: 145420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:35,624-Speed 11405.73 samples/sec Loss 5.6428 LearningRate 0.0130 Epoch: 25 Global Step: 145430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:36,540-Speed 11189.94 samples/sec Loss 5.5383 LearningRate 0.0130 Epoch: 25 Global Step: 145440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:23:37,451-Speed 11239.35 samples/sec Loss 5.5296 LearningRate 0.0130 Epoch: 25 Global Step: 145450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:38,375-Speed 11098.33 samples/sec Loss 5.4546 LearningRate 0.0130 Epoch: 25 Global Step: 145460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:39,287-Speed 11236.60 samples/sec Loss 5.4488 LearningRate 0.0130 Epoch: 25 Global Step: 145470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:40,182-Speed 11445.17 samples/sec Loss 5.3945 LearningRate 0.0130 Epoch: 25 Global Step: 145480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:41,122-Speed 10906.35 samples/sec Loss 5.5087 LearningRate 0.0130 Epoch: 25 Global Step: 145490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:42,025-Speed 11339.55 samples/sec Loss 5.5284 LearningRate 0.0130 Epoch: 25 Global Step: 145500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:42,960-Speed 10964.84 samples/sec Loss 5.4933 LearningRate 0.0130 Epoch: 25 Global Step: 145510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:43,886-Speed 11070.78 samples/sec Loss 5.4230 LearningRate 0.0130 Epoch: 25 Global Step: 145520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:44,781-Speed 11450.38 samples/sec Loss 5.6073 LearningRate 0.0130 Epoch: 25 Global Step: 145530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:45,669-Speed 11545.08 samples/sec Loss 5.6562 LearningRate 0.0130 Epoch: 25 Global Step: 145540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:46,559-Speed 11507.52 samples/sec Loss 5.5824 LearningRate 0.0130 Epoch: 25 Global Step: 145550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:23:47,448-Speed 11533.89 samples/sec Loss 5.5718 LearningRate 0.0130 Epoch: 25 Global Step: 145560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:23:48,382-Speed 10964.17 samples/sec Loss 5.4494 LearningRate 0.0130 Epoch: 25 Global Step: 145570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:49,302-Speed 11148.60 samples/sec Loss 5.4909 LearningRate 0.0130 Epoch: 25 Global Step: 145580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:50,201-Speed 11399.03 samples/sec Loss 5.6680 LearningRate 0.0130 Epoch: 25 Global Step: 145590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:51,106-Speed 11319.60 samples/sec Loss 5.5325 LearningRate 0.0129 Epoch: 25 Global Step: 145600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:52,025-Speed 11151.72 samples/sec Loss 5.4632 LearningRate 0.0129 Epoch: 25 Global Step: 145610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:52,960-Speed 10949.80 samples/sec Loss 5.4960 LearningRate 0.0129 Epoch: 25 Global Step: 145620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:53,865-Speed 11323.91 samples/sec Loss 5.5769 LearningRate 0.0129 Epoch: 25 Global Step: 145630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:54,818-Speed 10754.96 samples/sec Loss 5.5055 LearningRate 0.0129 Epoch: 25 Global Step: 145640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:55,765-Speed 10821.67 samples/sec Loss 5.5440 LearningRate 0.0129 Epoch: 25 Global Step: 145650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:56,652-Speed 11552.23 samples/sec Loss 5.3786 LearningRate 0.0129 Epoch: 25 Global Step: 145660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:57,534-Speed 11620.31 samples/sec Loss 5.4993 LearningRate 0.0129 Epoch: 25 Global Step: 145670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:58,445-Speed 11252.13 samples/sec Loss 5.5174 LearningRate 0.0129 Epoch: 25 Global Step: 145680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:23:59,352-Speed 11290.30 samples/sec Loss 5.3647 LearningRate 0.0129 Epoch: 25 Global Step: 145690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:00,304-Speed 10762.61 samples/sec Loss 5.5973 LearningRate 0.0129 Epoch: 25 Global Step: 145700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:01,201-Speed 11431.56 samples/sec Loss 5.5189 LearningRate 0.0129 Epoch: 25 Global Step: 145710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:02,093-Speed 11485.81 samples/sec Loss 5.5835 LearningRate 0.0129 Epoch: 25 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:02,996-Speed 11361.80 samples/sec Loss 5.5860 LearningRate 0.0129 Epoch: 25 Global Step: 145730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:03,884-Speed 11541.19 samples/sec Loss 5.5164 LearningRate 0.0129 Epoch: 25 Global Step: 145740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:04,795-Speed 11257.07 samples/sec Loss 5.4869 LearningRate 0.0129 Epoch: 25 Global Step: 145750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:05,700-Speed 11318.28 samples/sec Loss 5.6087 LearningRate 0.0129 Epoch: 25 Global Step: 145760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:06,658-Speed 10698.52 samples/sec Loss 5.4544 LearningRate 0.0129 Epoch: 25 Global Step: 145770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:24:07,560-Speed 11359.64 samples/sec Loss 5.5039 LearningRate 0.0129 Epoch: 25 Global Step: 145780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:08,477-Speed 11178.09 samples/sec Loss 5.4460 LearningRate 0.0129 Epoch: 25 Global Step: 145790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:09,394-Speed 11175.30 samples/sec Loss 5.5445 LearningRate 0.0129 Epoch: 25 Global Step: 145800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:10,286-Speed 11489.75 samples/sec Loss 5.6511 LearningRate 0.0129 Epoch: 25 Global Step: 145810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:11,157-Speed 11758.99 samples/sec Loss 5.5913 LearningRate 0.0129 Epoch: 25 Global Step: 145820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:12,095-Speed 10929.34 samples/sec Loss 5.7193 LearningRate 0.0129 Epoch: 25 Global Step: 145830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:13,011-Speed 11189.84 samples/sec Loss 5.5555 LearningRate 0.0129 Epoch: 25 Global Step: 145840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:13,948-Speed 10932.18 samples/sec Loss 5.3967 LearningRate 0.0129 Epoch: 25 Global Step: 145850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:14,882-Speed 10969.50 samples/sec Loss 5.4409 LearningRate 0.0129 Epoch: 25 Global Step: 145860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:15,792-Speed 11259.55 samples/sec Loss 5.3576 LearningRate 0.0129 Epoch: 25 Global Step: 145870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:16,698-Speed 11309.78 samples/sec Loss 5.6161 LearningRate 0.0129 Epoch: 25 Global Step: 145880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:17,600-Speed 11358.57 samples/sec Loss 5.5752 LearningRate 0.0129 Epoch: 25 Global Step: 145890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:18,563-Speed 10642.82 samples/sec Loss 5.5385 LearningRate 0.0129 Epoch: 25 Global Step: 145900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:19,517-Speed 10749.92 samples/sec Loss 5.4835 LearningRate 0.0128 Epoch: 25 Global Step: 145910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:20,411-Speed 11462.53 samples/sec Loss 5.5725 LearningRate 0.0128 Epoch: 25 Global Step: 145920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:21,300-Speed 11523.41 samples/sec Loss 5.3866 LearningRate 0.0128 Epoch: 25 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:24:22,189-Speed 11527.29 samples/sec Loss 5.4515 LearningRate 0.0128 Epoch: 25 Global Step: 145940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:23,094-Speed 11316.80 samples/sec Loss 5.5030 LearningRate 0.0128 Epoch: 25 Global Step: 145950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:23,997-Speed 11347.22 samples/sec Loss 5.6083 LearningRate 0.0128 Epoch: 25 Global Step: 145960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:24,913-Speed 11195.63 samples/sec Loss 5.5697 LearningRate 0.0128 Epoch: 25 Global Step: 145970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:25,805-Speed 11485.60 samples/sec Loss 5.3726 LearningRate 0.0128 Epoch: 25 Global Step: 145980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:26,748-Speed 10861.22 samples/sec Loss 5.4698 LearningRate 0.0128 Epoch: 25 Global Step: 145990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:27,660-Speed 11239.79 samples/sec Loss 5.5908 LearningRate 0.0128 Epoch: 25 Global Step: 146000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:24:48,917-[lfw][146000]XNorm: 9.097900 Training: 2022-04-11 04:24:48,917-[lfw][146000]Accuracy-Flip: 0.99667+-0.00279 Training: 2022-04-11 04:24:48,918-[lfw][146000]Accuracy-Highest: 0.99717 Training: 2022-04-11 04:25:13,563-[cfp_fp][146000]XNorm: 7.670259 Training: 2022-04-11 04:25:13,564-[cfp_fp][146000]Accuracy-Flip: 0.95200+-0.00961 Training: 2022-04-11 04:25:13,564-[cfp_fp][146000]Accuracy-Highest: 0.95200 Training: 2022-04-11 04:25:34,756-[agedb_30][146000]XNorm: 8.913675 Training: 2022-04-11 04:25:34,757-[agedb_30][146000]Accuracy-Flip: 0.96817+-0.00758 Training: 2022-04-11 04:25:34,757-[agedb_30][146000]Accuracy-Highest: 0.96817 Training: 2022-04-11 04:25:35,665-Speed 150.58 samples/sec Loss 5.5934 LearningRate 0.0128 Epoch: 25 Global Step: 146010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:25:36,568-Speed 11351.70 samples/sec Loss 5.5805 LearningRate 0.0128 Epoch: 25 Global Step: 146020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:25:37,507-Speed 10906.53 samples/sec Loss 5.5492 LearningRate 0.0128 Epoch: 25 Global Step: 146030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:25:38,463-Speed 10725.65 samples/sec Loss 5.4979 LearningRate 0.0128 Epoch: 25 Global Step: 146040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:39,424-Speed 10666.21 samples/sec Loss 5.5268 LearningRate 0.0128 Epoch: 25 Global Step: 146050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:40,383-Speed 10678.33 samples/sec Loss 5.5189 LearningRate 0.0128 Epoch: 25 Global Step: 146060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:41,297-Speed 11217.98 samples/sec Loss 5.4833 LearningRate 0.0128 Epoch: 25 Global Step: 146070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:42,212-Speed 11198.91 samples/sec Loss 5.4771 LearningRate 0.0128 Epoch: 25 Global Step: 146080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:43,120-Speed 11280.35 samples/sec Loss 5.5958 LearningRate 0.0128 Epoch: 25 Global Step: 146090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:44,028-Speed 11291.73 samples/sec Loss 5.5073 LearningRate 0.0128 Epoch: 25 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:44,932-Speed 11338.62 samples/sec Loss 5.5534 LearningRate 0.0128 Epoch: 25 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:45,841-Speed 11264.12 samples/sec Loss 5.5024 LearningRate 0.0128 Epoch: 25 Global Step: 146120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:46,755-Speed 11212.12 samples/sec Loss 5.6668 LearningRate 0.0128 Epoch: 25 Global Step: 146130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:47,714-Speed 10691.79 samples/sec Loss 5.5775 LearningRate 0.0128 Epoch: 25 Global Step: 146140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:25:48,669-Speed 10728.36 samples/sec Loss 5.6006 LearningRate 0.0128 Epoch: 25 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:49,565-Speed 11439.13 samples/sec Loss 5.5429 LearningRate 0.0128 Epoch: 25 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:50,476-Speed 11244.79 samples/sec Loss 5.5351 LearningRate 0.0128 Epoch: 25 Global Step: 146170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:51,394-Speed 11154.64 samples/sec Loss 5.4943 LearningRate 0.0128 Epoch: 25 Global Step: 146180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:52,348-Speed 10749.71 samples/sec Loss 5.5318 LearningRate 0.0128 Epoch: 25 Global Step: 146190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:53,327-Speed 10461.77 samples/sec Loss 5.4167 LearningRate 0.0128 Epoch: 25 Global Step: 146200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:54,298-Speed 10557.52 samples/sec Loss 5.6512 LearningRate 0.0128 Epoch: 25 Global Step: 146210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:55,254-Speed 10717.13 samples/sec Loss 5.5999 LearningRate 0.0128 Epoch: 25 Global Step: 146220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:56,219-Speed 10623.69 samples/sec Loss 5.6415 LearningRate 0.0127 Epoch: 25 Global Step: 146230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:57,134-Speed 11202.74 samples/sec Loss 5.5043 LearningRate 0.0127 Epoch: 25 Global Step: 146240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:58,024-Speed 11504.59 samples/sec Loss 5.4955 LearningRate 0.0127 Epoch: 25 Global Step: 146250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:25:58,909-Speed 11589.70 samples/sec Loss 5.7194 LearningRate 0.0127 Epoch: 25 Global Step: 146260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:25:59,838-Speed 11022.15 samples/sec Loss 5.4860 LearningRate 0.0127 Epoch: 25 Global Step: 146270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:00,751-Speed 11235.82 samples/sec Loss 5.3463 LearningRate 0.0127 Epoch: 25 Global Step: 146280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:01,659-Speed 11280.96 samples/sec Loss 5.5132 LearningRate 0.0127 Epoch: 25 Global Step: 146290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:02,598-Speed 10912.35 samples/sec Loss 5.5168 LearningRate 0.0127 Epoch: 25 Global Step: 146300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:03,495-Speed 11425.53 samples/sec Loss 5.6209 LearningRate 0.0127 Epoch: 25 Global Step: 146310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:04,420-Speed 11082.61 samples/sec Loss 5.4835 LearningRate 0.0127 Epoch: 25 Global Step: 146320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:05,311-Speed 11497.50 samples/sec Loss 5.5022 LearningRate 0.0127 Epoch: 25 Global Step: 146330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:06,219-Speed 11283.16 samples/sec Loss 5.4402 LearningRate 0.0127 Epoch: 25 Global Step: 146340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:07,134-Speed 11206.74 samples/sec Loss 5.6609 LearningRate 0.0127 Epoch: 25 Global Step: 146350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:08,066-Speed 10995.77 samples/sec Loss 5.5629 LearningRate 0.0127 Epoch: 25 Global Step: 146360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:08,968-Speed 11353.40 samples/sec Loss 5.5154 LearningRate 0.0127 Epoch: 25 Global Step: 146370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:09,859-Speed 11504.49 samples/sec Loss 5.5010 LearningRate 0.0127 Epoch: 25 Global Step: 146380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:10,766-Speed 11293.59 samples/sec Loss 5.4185 LearningRate 0.0127 Epoch: 25 Global Step: 146390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:11,668-Speed 11370.88 samples/sec Loss 5.5785 LearningRate 0.0127 Epoch: 25 Global Step: 146400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:12,552-Speed 11586.74 samples/sec Loss 5.4976 LearningRate 0.0127 Epoch: 25 Global Step: 146410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:13,474-Speed 11107.69 samples/sec Loss 5.4721 LearningRate 0.0127 Epoch: 25 Global Step: 146420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:14,370-Speed 11466.55 samples/sec Loss 5.6031 LearningRate 0.0127 Epoch: 25 Global Step: 146430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:15,283-Speed 11224.73 samples/sec Loss 5.5754 LearningRate 0.0127 Epoch: 25 Global Step: 146440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:16,222-Speed 10913.09 samples/sec Loss 5.6207 LearningRate 0.0127 Epoch: 25 Global Step: 146450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:17,152-Speed 11016.41 samples/sec Loss 5.5548 LearningRate 0.0127 Epoch: 25 Global Step: 146460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:18,065-Speed 11219.97 samples/sec Loss 5.5269 LearningRate 0.0127 Epoch: 25 Global Step: 146470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:19,025-Speed 10679.29 samples/sec Loss 5.5838 LearningRate 0.0127 Epoch: 25 Global Step: 146480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:19,943-Speed 11166.73 samples/sec Loss 5.4973 LearningRate 0.0127 Epoch: 25 Global Step: 146490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:20,880-Speed 10934.21 samples/sec Loss 5.5625 LearningRate 0.0127 Epoch: 25 Global Step: 146500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:21,778-Speed 11413.96 samples/sec Loss 5.6150 LearningRate 0.0127 Epoch: 25 Global Step: 146510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:22,667-Speed 11522.70 samples/sec Loss 5.5628 LearningRate 0.0127 Epoch: 25 Global Step: 146520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:23,544-Speed 11691.30 samples/sec Loss 5.6893 LearningRate 0.0127 Epoch: 25 Global Step: 146530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:24,436-Speed 11487.85 samples/sec Loss 5.6279 LearningRate 0.0127 Epoch: 25 Global Step: 146540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:25,349-Speed 11228.22 samples/sec Loss 5.6298 LearningRate 0.0126 Epoch: 25 Global Step: 146550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:26,229-Speed 11645.88 samples/sec Loss 5.4308 LearningRate 0.0126 Epoch: 25 Global Step: 146560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:27,132-Speed 11338.37 samples/sec Loss 5.4537 LearningRate 0.0126 Epoch: 25 Global Step: 146570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:28,084-Speed 10765.44 samples/sec Loss 5.5356 LearningRate 0.0126 Epoch: 25 Global Step: 146580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:29,033-Speed 10803.56 samples/sec Loss 5.4833 LearningRate 0.0126 Epoch: 25 Global Step: 146590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:29,941-Speed 11277.87 samples/sec Loss 5.3882 LearningRate 0.0126 Epoch: 25 Global Step: 146600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:30,841-Speed 11388.86 samples/sec Loss 5.4896 LearningRate 0.0126 Epoch: 25 Global Step: 146610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:31,769-Speed 11040.46 samples/sec Loss 5.6009 LearningRate 0.0126 Epoch: 25 Global Step: 146620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:32,684-Speed 11195.76 samples/sec Loss 5.6020 LearningRate 0.0126 Epoch: 25 Global Step: 146630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:33,586-Speed 11359.25 samples/sec Loss 5.4241 LearningRate 0.0126 Epoch: 25 Global Step: 146640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:34,474-Speed 11542.65 samples/sec Loss 5.5584 LearningRate 0.0126 Epoch: 25 Global Step: 146650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:35,364-Speed 11514.10 samples/sec Loss 5.5896 LearningRate 0.0126 Epoch: 25 Global Step: 146660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:36,256-Speed 11489.10 samples/sec Loss 5.5912 LearningRate 0.0126 Epoch: 25 Global Step: 146670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:37,147-Speed 11498.94 samples/sec Loss 5.4526 LearningRate 0.0126 Epoch: 25 Global Step: 146680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:38,043-Speed 11438.14 samples/sec Loss 5.5205 LearningRate 0.0126 Epoch: 25 Global Step: 146690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:38,964-Speed 11128.43 samples/sec Loss 5.5719 LearningRate 0.0126 Epoch: 25 Global Step: 146700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:39,861-Speed 11420.20 samples/sec Loss 5.4342 LearningRate 0.0126 Epoch: 25 Global Step: 146710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:40,824-Speed 10642.40 samples/sec Loss 5.5276 LearningRate 0.0126 Epoch: 25 Global Step: 146720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:41,728-Speed 11338.99 samples/sec Loss 5.3705 LearningRate 0.0126 Epoch: 25 Global Step: 146730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:42,648-Speed 11138.66 samples/sec Loss 5.6648 LearningRate 0.0126 Epoch: 25 Global Step: 146740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:43,601-Speed 10753.96 samples/sec Loss 5.4527 LearningRate 0.0126 Epoch: 25 Global Step: 146750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:44,524-Speed 11100.14 samples/sec Loss 5.5784 LearningRate 0.0126 Epoch: 25 Global Step: 146760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:45,418-Speed 11468.95 samples/sec Loss 5.4753 LearningRate 0.0126 Epoch: 25 Global Step: 146770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:46,305-Speed 11555.00 samples/sec Loss 5.6179 LearningRate 0.0126 Epoch: 25 Global Step: 146780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:26:47,202-Speed 11424.30 samples/sec Loss 5.7057 LearningRate 0.0126 Epoch: 25 Global Step: 146790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:48,120-Speed 11168.83 samples/sec Loss 5.6032 LearningRate 0.0126 Epoch: 25 Global Step: 146800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:49,014-Speed 11461.34 samples/sec Loss 5.6178 LearningRate 0.0126 Epoch: 25 Global Step: 146810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:49,915-Speed 11372.72 samples/sec Loss 5.6460 LearningRate 0.0126 Epoch: 25 Global Step: 146820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:50,824-Speed 11269.33 samples/sec Loss 5.5340 LearningRate 0.0126 Epoch: 25 Global Step: 146830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:51,732-Speed 11284.70 samples/sec Loss 5.6155 LearningRate 0.0126 Epoch: 25 Global Step: 146840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:52,661-Speed 11027.43 samples/sec Loss 5.4831 LearningRate 0.0126 Epoch: 25 Global Step: 146850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:26:53,570-Speed 11274.26 samples/sec Loss 5.7110 LearningRate 0.0126 Epoch: 25 Global Step: 146860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:54,530-Speed 10682.97 samples/sec Loss 5.6053 LearningRate 0.0125 Epoch: 25 Global Step: 146870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:55,450-Speed 11130.11 samples/sec Loss 5.5798 LearningRate 0.0125 Epoch: 25 Global Step: 146880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:56,355-Speed 11319.83 samples/sec Loss 5.4991 LearningRate 0.0125 Epoch: 25 Global Step: 146890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:57,267-Speed 11237.02 samples/sec Loss 5.3648 LearningRate 0.0125 Epoch: 25 Global Step: 146900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:58,184-Speed 11180.38 samples/sec Loss 5.4880 LearningRate 0.0125 Epoch: 25 Global Step: 146910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:26:59,110-Speed 11068.49 samples/sec Loss 5.6234 LearningRate 0.0125 Epoch: 25 Global Step: 146920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:27:00,027-Speed 11168.10 samples/sec Loss 5.6453 LearningRate 0.0125 Epoch: 25 Global Step: 146930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:27:00,933-Speed 11312.17 samples/sec Loss 5.5838 LearningRate 0.0125 Epoch: 25 Global Step: 146940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:27:01,889-Speed 10716.71 samples/sec Loss 5.4711 LearningRate 0.0125 Epoch: 25 Global Step: 146950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:27:02,833-Speed 10855.05 samples/sec Loss 5.4971 LearningRate 0.0125 Epoch: 25 Global Step: 146960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:03,747-Speed 11213.84 samples/sec Loss 5.5134 LearningRate 0.0125 Epoch: 25 Global Step: 146970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:04,640-Speed 11478.18 samples/sec Loss 5.4416 LearningRate 0.0125 Epoch: 25 Global Step: 146980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:05,542-Speed 11351.55 samples/sec Loss 5.5193 LearningRate 0.0125 Epoch: 25 Global Step: 146990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:06,462-Speed 11149.46 samples/sec Loss 5.4964 LearningRate 0.0125 Epoch: 25 Global Step: 147000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:07,370-Speed 11283.36 samples/sec Loss 5.6020 LearningRate 0.0125 Epoch: 25 Global Step: 147010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:08,298-Speed 11042.37 samples/sec Loss 5.4359 LearningRate 0.0125 Epoch: 25 Global Step: 147020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:09,201-Speed 11350.41 samples/sec Loss 5.6322 LearningRate 0.0125 Epoch: 25 Global Step: 147030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:10,083-Speed 11614.45 samples/sec Loss 5.5291 LearningRate 0.0125 Epoch: 25 Global Step: 147040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:10,966-Speed 11607.74 samples/sec Loss 5.4639 LearningRate 0.0125 Epoch: 25 Global Step: 147050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:11,866-Speed 11389.95 samples/sec Loss 5.5558 LearningRate 0.0125 Epoch: 25 Global Step: 147060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:12,862-Speed 10282.30 samples/sec Loss 5.4962 LearningRate 0.0125 Epoch: 25 Global Step: 147070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:13,818-Speed 10720.20 samples/sec Loss 5.5929 LearningRate 0.0125 Epoch: 25 Global Step: 147080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:14,711-Speed 11479.75 samples/sec Loss 5.3940 LearningRate 0.0125 Epoch: 25 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:15,622-Speed 11244.25 samples/sec Loss 5.5660 LearningRate 0.0125 Epoch: 25 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:16,502-Speed 11651.60 samples/sec Loss 5.4984 LearningRate 0.0125 Epoch: 25 Global Step: 147110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:17,414-Speed 11231.94 samples/sec Loss 5.5163 LearningRate 0.0125 Epoch: 25 Global Step: 147120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:18,341-Speed 11059.75 samples/sec Loss 5.5248 LearningRate 0.0125 Epoch: 25 Global Step: 147130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:19,262-Speed 11124.43 samples/sec Loss 5.5708 LearningRate 0.0125 Epoch: 25 Global Step: 147140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:20,191-Speed 11024.05 samples/sec Loss 5.4416 LearningRate 0.0125 Epoch: 25 Global Step: 147150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:21,112-Speed 11130.75 samples/sec Loss 5.4769 LearningRate 0.0125 Epoch: 25 Global Step: 147160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:22,016-Speed 11339.09 samples/sec Loss 5.6168 LearningRate 0.0125 Epoch: 25 Global Step: 147170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:22,925-Speed 11272.37 samples/sec Loss 5.5542 LearningRate 0.0125 Epoch: 25 Global Step: 147180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:23,838-Speed 11218.04 samples/sec Loss 5.5102 LearningRate 0.0124 Epoch: 25 Global Step: 147190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:24,766-Speed 11046.55 samples/sec Loss 5.4621 LearningRate 0.0124 Epoch: 25 Global Step: 147200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:25,674-Speed 11287.62 samples/sec Loss 5.4818 LearningRate 0.0124 Epoch: 25 Global Step: 147210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:26,601-Speed 11047.64 samples/sec Loss 5.6381 LearningRate 0.0124 Epoch: 25 Global Step: 147220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:27,504-Speed 11354.21 samples/sec Loss 5.5752 LearningRate 0.0124 Epoch: 25 Global Step: 147230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:28,449-Speed 10843.70 samples/sec Loss 5.5686 LearningRate 0.0124 Epoch: 25 Global Step: 147240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:29,358-Speed 11270.15 samples/sec Loss 5.6080 LearningRate 0.0124 Epoch: 25 Global Step: 147250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:30,313-Speed 10734.87 samples/sec Loss 5.5946 LearningRate 0.0124 Epoch: 25 Global Step: 147260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:31,207-Speed 11454.41 samples/sec Loss 5.5754 LearningRate 0.0124 Epoch: 25 Global Step: 147270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:32,113-Speed 11318.44 samples/sec Loss 5.6445 LearningRate 0.0124 Epoch: 25 Global Step: 147280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:33,017-Speed 11338.37 samples/sec Loss 5.4278 LearningRate 0.0124 Epoch: 25 Global Step: 147290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:33,895-Speed 11663.20 samples/sec Loss 5.5398 LearningRate 0.0124 Epoch: 25 Global Step: 147300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:34,801-Speed 11309.48 samples/sec Loss 5.5058 LearningRate 0.0124 Epoch: 25 Global Step: 147310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:35,693-Speed 11490.81 samples/sec Loss 5.4730 LearningRate 0.0124 Epoch: 25 Global Step: 147320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:36,582-Speed 11524.10 samples/sec Loss 5.6090 LearningRate 0.0124 Epoch: 25 Global Step: 147330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:37,498-Speed 11193.40 samples/sec Loss 5.4699 LearningRate 0.0124 Epoch: 25 Global Step: 147340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:38,394-Speed 11435.02 samples/sec Loss 5.6086 LearningRate 0.0124 Epoch: 25 Global Step: 147350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:39,317-Speed 11103.69 samples/sec Loss 5.4750 LearningRate 0.0124 Epoch: 25 Global Step: 147360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:40,241-Speed 11088.31 samples/sec Loss 5.5216 LearningRate 0.0124 Epoch: 25 Global Step: 147370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:41,189-Speed 10805.01 samples/sec Loss 5.7040 LearningRate 0.0124 Epoch: 25 Global Step: 147380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:42,120-Speed 11004.55 samples/sec Loss 5.4682 LearningRate 0.0124 Epoch: 25 Global Step: 147390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:43,012-Speed 11497.35 samples/sec Loss 5.4120 LearningRate 0.0124 Epoch: 25 Global Step: 147400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:43,942-Speed 11015.61 samples/sec Loss 5.5207 LearningRate 0.0124 Epoch: 25 Global Step: 147410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:44,857-Speed 11198.67 samples/sec Loss 5.6281 LearningRate 0.0124 Epoch: 25 Global Step: 147420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:45,737-Speed 11642.47 samples/sec Loss 5.5991 LearningRate 0.0124 Epoch: 25 Global Step: 147430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:46,626-Speed 11531.99 samples/sec Loss 5.5765 LearningRate 0.0124 Epoch: 25 Global Step: 147440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:47,518-Speed 11495.04 samples/sec Loss 5.5634 LearningRate 0.0124 Epoch: 25 Global Step: 147450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:48,434-Speed 11182.78 samples/sec Loss 5.5297 LearningRate 0.0124 Epoch: 25 Global Step: 147460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:49,383-Speed 10799.88 samples/sec Loss 5.4827 LearningRate 0.0124 Epoch: 25 Global Step: 147470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:50,315-Speed 10987.90 samples/sec Loss 5.6411 LearningRate 0.0124 Epoch: 25 Global Step: 147480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:51,234-Speed 11151.50 samples/sec Loss 5.4198 LearningRate 0.0124 Epoch: 25 Global Step: 147490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:52,179-Speed 10847.16 samples/sec Loss 5.6055 LearningRate 0.0124 Epoch: 25 Global Step: 147500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:53,108-Speed 11028.24 samples/sec Loss 5.4236 LearningRate 0.0124 Epoch: 25 Global Step: 147510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:54,053-Speed 10846.33 samples/sec Loss 5.6396 LearningRate 0.0123 Epoch: 25 Global Step: 147520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:54,969-Speed 11182.50 samples/sec Loss 5.4678 LearningRate 0.0123 Epoch: 25 Global Step: 147530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:55,887-Speed 11161.97 samples/sec Loss 5.6191 LearningRate 0.0123 Epoch: 25 Global Step: 147540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:56,787-Speed 11389.56 samples/sec Loss 5.6487 LearningRate 0.0123 Epoch: 25 Global Step: 147550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:27:57,664-Speed 11678.07 samples/sec Loss 5.3987 LearningRate 0.0123 Epoch: 25 Global Step: 147560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:58,543-Speed 11661.00 samples/sec Loss 5.5133 LearningRate 0.0123 Epoch: 25 Global Step: 147570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:27:59,444-Speed 11382.41 samples/sec Loss 5.4976 LearningRate 0.0123 Epoch: 25 Global Step: 147580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:00,344-Speed 11385.24 samples/sec Loss 5.5639 LearningRate 0.0123 Epoch: 25 Global Step: 147590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:01,284-Speed 10898.24 samples/sec Loss 5.6671 LearningRate 0.0123 Epoch: 25 Global Step: 147600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:02,187-Speed 11351.61 samples/sec Loss 5.4283 LearningRate 0.0123 Epoch: 25 Global Step: 147610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:03,149-Speed 10642.58 samples/sec Loss 5.4935 LearningRate 0.0123 Epoch: 25 Global Step: 147620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:04,088-Speed 10913.52 samples/sec Loss 5.6259 LearningRate 0.0123 Epoch: 25 Global Step: 147630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:04,976-Speed 11547.44 samples/sec Loss 5.5360 LearningRate 0.0123 Epoch: 25 Global Step: 147640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:05,923-Speed 10817.76 samples/sec Loss 5.5471 LearningRate 0.0123 Epoch: 25 Global Step: 147650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:06,849-Speed 11074.97 samples/sec Loss 5.5871 LearningRate 0.0123 Epoch: 25 Global Step: 147660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:28:07,749-Speed 11381.51 samples/sec Loss 5.4341 LearningRate 0.0123 Epoch: 25 Global Step: 147670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:08,686-Speed 10930.73 samples/sec Loss 5.5406 LearningRate 0.0123 Epoch: 25 Global Step: 147680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:09,579-Speed 11476.72 samples/sec Loss 5.5233 LearningRate 0.0123 Epoch: 25 Global Step: 147690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:10,474-Speed 11451.29 samples/sec Loss 5.4345 LearningRate 0.0123 Epoch: 25 Global Step: 147700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:11,400-Speed 11072.78 samples/sec Loss 5.5069 LearningRate 0.0123 Epoch: 25 Global Step: 147710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:12,297-Speed 11418.13 samples/sec Loss 5.5168 LearningRate 0.0123 Epoch: 25 Global Step: 147720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:13,217-Speed 11135.98 samples/sec Loss 5.5044 LearningRate 0.0123 Epoch: 25 Global Step: 147730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:14,124-Speed 11296.23 samples/sec Loss 5.4930 LearningRate 0.0123 Epoch: 25 Global Step: 147740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:15,048-Speed 11099.97 samples/sec Loss 5.5696 LearningRate 0.0123 Epoch: 25 Global Step: 147750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:15,951-Speed 11343.97 samples/sec Loss 5.7472 LearningRate 0.0123 Epoch: 25 Global Step: 147760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:16,909-Speed 10690.37 samples/sec Loss 5.5877 LearningRate 0.0123 Epoch: 25 Global Step: 147770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:28:17,863-Speed 10746.77 samples/sec Loss 5.6504 LearningRate 0.0123 Epoch: 25 Global Step: 147780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:28:18,811-Speed 10810.80 samples/sec Loss 5.4455 LearningRate 0.0123 Epoch: 25 Global Step: 147790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:19,720-Speed 11272.09 samples/sec Loss 5.6120 LearningRate 0.0123 Epoch: 25 Global Step: 147800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:20,605-Speed 11576.96 samples/sec Loss 5.5754 LearningRate 0.0123 Epoch: 25 Global Step: 147810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:21,508-Speed 11347.94 samples/sec Loss 5.5967 LearningRate 0.0123 Epoch: 25 Global Step: 147820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:22,467-Speed 10682.49 samples/sec Loss 5.5917 LearningRate 0.0123 Epoch: 25 Global Step: 147830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:33,053-Speed 967.43 samples/sec Loss 5.2718 LearningRate 0.0122 Epoch: 26 Global Step: 147840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:33,972-Speed 11161.22 samples/sec Loss 4.8233 LearningRate 0.0122 Epoch: 26 Global Step: 147850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:34,928-Speed 10715.92 samples/sec Loss 4.9919 LearningRate 0.0122 Epoch: 26 Global Step: 147860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:35,846-Speed 11168.94 samples/sec Loss 4.8661 LearningRate 0.0122 Epoch: 26 Global Step: 147870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:36,779-Speed 10978.87 samples/sec Loss 4.9393 LearningRate 0.0122 Epoch: 26 Global Step: 147880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:37,794-Speed 10097.29 samples/sec Loss 4.8376 LearningRate 0.0122 Epoch: 26 Global Step: 147890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:28:39,026-Speed 8322.26 samples/sec Loss 4.9131 LearningRate 0.0122 Epoch: 26 Global Step: 147900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:28:39,948-Speed 11112.05 samples/sec Loss 4.9531 LearningRate 0.0122 Epoch: 26 Global Step: 147910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:28:40,879-Speed 11013.82 samples/sec Loss 4.9641 LearningRate 0.0122 Epoch: 26 Global Step: 147920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:41,865-Speed 10382.86 samples/sec Loss 5.0780 LearningRate 0.0122 Epoch: 26 Global Step: 147930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:42,837-Speed 10551.34 samples/sec Loss 4.8025 LearningRate 0.0122 Epoch: 26 Global Step: 147940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:43,763-Speed 11063.95 samples/sec Loss 4.8104 LearningRate 0.0122 Epoch: 26 Global Step: 147950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:44,670-Speed 11289.49 samples/sec Loss 4.8188 LearningRate 0.0122 Epoch: 26 Global Step: 147960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:45,578-Speed 11292.00 samples/sec Loss 4.8924 LearningRate 0.0122 Epoch: 26 Global Step: 147970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:46,496-Speed 11165.49 samples/sec Loss 4.9988 LearningRate 0.0122 Epoch: 26 Global Step: 147980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:47,459-Speed 10634.18 samples/sec Loss 4.9216 LearningRate 0.0122 Epoch: 26 Global Step: 147990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:28:48,418-Speed 10685.30 samples/sec Loss 4.9622 LearningRate 0.0122 Epoch: 26 Global Step: 148000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:29:09,694-[lfw][148000]XNorm: 9.089396 Training: 2022-04-11 04:29:09,694-[lfw][148000]Accuracy-Flip: 0.99683+-0.00293 Training: 2022-04-11 04:29:09,695-[lfw][148000]Accuracy-Highest: 0.99717 Training: 2022-04-11 04:29:34,252-[cfp_fp][148000]XNorm: 7.707289 Training: 2022-04-11 04:29:34,252-[cfp_fp][148000]Accuracy-Flip: 0.95100+-0.00994 Training: 2022-04-11 04:29:34,253-[cfp_fp][148000]Accuracy-Highest: 0.95200 Training: 2022-04-11 04:29:55,466-[agedb_30][148000]XNorm: 8.879377 Training: 2022-04-11 04:29:55,467-[agedb_30][148000]Accuracy-Flip: 0.96600+-0.00629 Training: 2022-04-11 04:29:55,467-[agedb_30][148000]Accuracy-Highest: 0.96817 Training: 2022-04-11 04:29:56,415-Speed 150.60 samples/sec Loss 4.9291 LearningRate 0.0122 Epoch: 26 Global Step: 148010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:29:57,324-Speed 11276.93 samples/sec Loss 4.9143 LearningRate 0.0122 Epoch: 26 Global Step: 148020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:29:58,227-Speed 11344.91 samples/sec Loss 4.8730 LearningRate 0.0122 Epoch: 26 Global Step: 148030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:29:59,128-Speed 11377.83 samples/sec Loss 4.9948 LearningRate 0.0122 Epoch: 26 Global Step: 148040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:00,044-Speed 11185.32 samples/sec Loss 4.9270 LearningRate 0.0122 Epoch: 26 Global Step: 148050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:00,999-Speed 10722.94 samples/sec Loss 4.8199 LearningRate 0.0122 Epoch: 26 Global Step: 148060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:01,922-Speed 11107.11 samples/sec Loss 4.8673 LearningRate 0.0122 Epoch: 26 Global Step: 148070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:02,847-Speed 11078.78 samples/sec Loss 4.9808 LearningRate 0.0122 Epoch: 26 Global Step: 148080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:03,779-Speed 10985.25 samples/sec Loss 4.9402 LearningRate 0.0122 Epoch: 26 Global Step: 148090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:04,698-Speed 11153.33 samples/sec Loss 5.0144 LearningRate 0.0122 Epoch: 26 Global Step: 148100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:05,606-Speed 11293.56 samples/sec Loss 4.9259 LearningRate 0.0122 Epoch: 26 Global Step: 148110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:06,537-Speed 11007.44 samples/sec Loss 5.0310 LearningRate 0.0122 Epoch: 26 Global Step: 148120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:07,475-Speed 10922.15 samples/sec Loss 5.0324 LearningRate 0.0122 Epoch: 26 Global Step: 148130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:08,397-Speed 11113.63 samples/sec Loss 4.9251 LearningRate 0.0122 Epoch: 26 Global Step: 148140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:09,291-Speed 11460.40 samples/sec Loss 5.0501 LearningRate 0.0122 Epoch: 26 Global Step: 148150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:10,188-Speed 11427.36 samples/sec Loss 5.0547 LearningRate 0.0122 Epoch: 26 Global Step: 148160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:11,089-Speed 11377.34 samples/sec Loss 4.9845 LearningRate 0.0121 Epoch: 26 Global Step: 148170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:11,995-Speed 11306.19 samples/sec Loss 4.9302 LearningRate 0.0121 Epoch: 26 Global Step: 148180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:12,881-Speed 11568.21 samples/sec Loss 4.9881 LearningRate 0.0121 Epoch: 26 Global Step: 148190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:13,791-Speed 11256.41 samples/sec Loss 4.9924 LearningRate 0.0121 Epoch: 26 Global Step: 148200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:14,721-Speed 11017.05 samples/sec Loss 4.8086 LearningRate 0.0121 Epoch: 26 Global Step: 148210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:15,633-Speed 11238.20 samples/sec Loss 5.1443 LearningRate 0.0121 Epoch: 26 Global Step: 148220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:16,566-Speed 10978.89 samples/sec Loss 5.0604 LearningRate 0.0121 Epoch: 26 Global Step: 148230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:17,507-Speed 10892.46 samples/sec Loss 5.0449 LearningRate 0.0121 Epoch: 26 Global Step: 148240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:18,448-Speed 10891.21 samples/sec Loss 5.1143 LearningRate 0.0121 Epoch: 26 Global Step: 148250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:19,329-Speed 11627.74 samples/sec Loss 5.1034 LearningRate 0.0121 Epoch: 26 Global Step: 148260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:20,241-Speed 11243.98 samples/sec Loss 4.9517 LearningRate 0.0121 Epoch: 26 Global Step: 148270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:21,157-Speed 11179.90 samples/sec Loss 5.1119 LearningRate 0.0121 Epoch: 26 Global Step: 148280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:22,053-Speed 11444.79 samples/sec Loss 5.1759 LearningRate 0.0121 Epoch: 26 Global Step: 148290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:22,958-Speed 11321.20 samples/sec Loss 5.0678 LearningRate 0.0121 Epoch: 26 Global Step: 148300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:23,879-Speed 11123.31 samples/sec Loss 4.9975 LearningRate 0.0121 Epoch: 26 Global Step: 148310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:24,856-Speed 10485.50 samples/sec Loss 4.9173 LearningRate 0.0121 Epoch: 26 Global Step: 148320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:25,781-Speed 11087.31 samples/sec Loss 4.9938 LearningRate 0.0121 Epoch: 26 Global Step: 148330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:26,695-Speed 11209.70 samples/sec Loss 4.9107 LearningRate 0.0121 Epoch: 26 Global Step: 148340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:27,632-Speed 10926.67 samples/sec Loss 5.1368 LearningRate 0.0121 Epoch: 26 Global Step: 148350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:28,585-Speed 10752.19 samples/sec Loss 4.9673 LearningRate 0.0121 Epoch: 26 Global Step: 148360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:29,524-Speed 10924.54 samples/sec Loss 5.1094 LearningRate 0.0121 Epoch: 26 Global Step: 148370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:30,486-Speed 10646.16 samples/sec Loss 5.1735 LearningRate 0.0121 Epoch: 26 Global Step: 148380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:31,449-Speed 10642.49 samples/sec Loss 4.9739 LearningRate 0.0121 Epoch: 26 Global Step: 148390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:32,384-Speed 10961.61 samples/sec Loss 4.9333 LearningRate 0.0121 Epoch: 26 Global Step: 148400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:33,280-Speed 11434.97 samples/sec Loss 5.0116 LearningRate 0.0121 Epoch: 26 Global Step: 148410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:34,186-Speed 11306.19 samples/sec Loss 5.0276 LearningRate 0.0121 Epoch: 26 Global Step: 148420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:35,078-Speed 11484.28 samples/sec Loss 4.9952 LearningRate 0.0121 Epoch: 26 Global Step: 148430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:35,978-Speed 11385.08 samples/sec Loss 4.9716 LearningRate 0.0121 Epoch: 26 Global Step: 148440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:36,906-Speed 11047.91 samples/sec Loss 5.1822 LearningRate 0.0121 Epoch: 26 Global Step: 148450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:37,861-Speed 10726.72 samples/sec Loss 5.0268 LearningRate 0.0121 Epoch: 26 Global Step: 148460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:38,900-Speed 9863.30 samples/sec Loss 5.1724 LearningRate 0.0121 Epoch: 26 Global Step: 148470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:39,814-Speed 11221.53 samples/sec Loss 5.0706 LearningRate 0.0121 Epoch: 26 Global Step: 148480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:40,735-Speed 11129.77 samples/sec Loss 5.1080 LearningRate 0.0120 Epoch: 26 Global Step: 148490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:41,651-Speed 11185.52 samples/sec Loss 5.0704 LearningRate 0.0120 Epoch: 26 Global Step: 148500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:42,607-Speed 10718.09 samples/sec Loss 5.0322 LearningRate 0.0120 Epoch: 26 Global Step: 148510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:43,517-Speed 11258.92 samples/sec Loss 5.0829 LearningRate 0.0120 Epoch: 26 Global Step: 148520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:44,616-Speed 9321.96 samples/sec Loss 5.0758 LearningRate 0.0120 Epoch: 26 Global Step: 148530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:45,588-Speed 10560.27 samples/sec Loss 5.1122 LearningRate 0.0120 Epoch: 26 Global Step: 148540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:46,501-Speed 11225.27 samples/sec Loss 5.0810 LearningRate 0.0120 Epoch: 26 Global Step: 148550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:47,450-Speed 10793.30 samples/sec Loss 5.1419 LearningRate 0.0120 Epoch: 26 Global Step: 148560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:48,363-Speed 11231.45 samples/sec Loss 5.0843 LearningRate 0.0120 Epoch: 26 Global Step: 148570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:49,268-Speed 11312.42 samples/sec Loss 5.1901 LearningRate 0.0120 Epoch: 26 Global Step: 148580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:50,225-Speed 10719.29 samples/sec Loss 5.1097 LearningRate 0.0120 Epoch: 26 Global Step: 148590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:51,140-Speed 11207.96 samples/sec Loss 5.0508 LearningRate 0.0120 Epoch: 26 Global Step: 148600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:52,030-Speed 11508.56 samples/sec Loss 5.0321 LearningRate 0.0120 Epoch: 26 Global Step: 148610 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:30:52,968-Speed 10924.15 samples/sec Loss 4.9197 LearningRate 0.0120 Epoch: 26 Global Step: 148620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:30:53,911-Speed 10865.79 samples/sec Loss 5.0963 LearningRate 0.0120 Epoch: 26 Global Step: 148630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:30:54,853-Speed 10884.38 samples/sec Loss 5.0723 LearningRate 0.0120 Epoch: 26 Global Step: 148640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:30:55,784-Speed 11006.34 samples/sec Loss 5.0956 LearningRate 0.0120 Epoch: 26 Global Step: 148650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:30:56,675-Speed 11496.00 samples/sec Loss 5.2788 LearningRate 0.0120 Epoch: 26 Global Step: 148660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:30:57,639-Speed 10631.39 samples/sec Loss 5.0668 LearningRate 0.0120 Epoch: 26 Global Step: 148670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:30:58,582-Speed 10857.39 samples/sec Loss 5.1474 LearningRate 0.0120 Epoch: 26 Global Step: 148680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:30:59,538-Speed 10729.11 samples/sec Loss 5.0937 LearningRate 0.0120 Epoch: 26 Global Step: 148690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:00,505-Speed 10591.39 samples/sec Loss 5.0809 LearningRate 0.0120 Epoch: 26 Global Step: 148700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:01,461-Speed 10719.42 samples/sec Loss 5.1330 LearningRate 0.0120 Epoch: 26 Global Step: 148710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:02,374-Speed 11232.26 samples/sec Loss 5.0381 LearningRate 0.0120 Epoch: 26 Global Step: 148720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:03,332-Speed 10696.01 samples/sec Loss 5.1784 LearningRate 0.0120 Epoch: 26 Global Step: 148730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:04,282-Speed 10777.59 samples/sec Loss 5.2043 LearningRate 0.0120 Epoch: 26 Global Step: 148740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:05,206-Speed 11103.35 samples/sec Loss 5.2012 LearningRate 0.0120 Epoch: 26 Global Step: 148750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:06,140-Speed 10967.29 samples/sec Loss 5.1463 LearningRate 0.0120 Epoch: 26 Global Step: 148760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:07,093-Speed 10753.13 samples/sec Loss 5.0982 LearningRate 0.0120 Epoch: 26 Global Step: 148770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:08,055-Speed 10650.80 samples/sec Loss 5.2333 LearningRate 0.0120 Epoch: 26 Global Step: 148780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:08,972-Speed 11174.35 samples/sec Loss 5.1956 LearningRate 0.0120 Epoch: 26 Global Step: 148790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:09,866-Speed 11466.45 samples/sec Loss 5.0899 LearningRate 0.0120 Epoch: 26 Global Step: 148800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:10,775-Speed 11275.02 samples/sec Loss 5.2231 LearningRate 0.0120 Epoch: 26 Global Step: 148810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:11,716-Speed 10879.11 samples/sec Loss 5.1009 LearningRate 0.0119 Epoch: 26 Global Step: 148820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:12,674-Speed 10695.06 samples/sec Loss 5.2434 LearningRate 0.0119 Epoch: 26 Global Step: 148830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:13,614-Speed 10909.05 samples/sec Loss 5.0245 LearningRate 0.0119 Epoch: 26 Global Step: 148840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:14,583-Speed 10581.44 samples/sec Loss 5.0699 LearningRate 0.0119 Epoch: 26 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:15,535-Speed 10762.16 samples/sec Loss 5.1089 LearningRate 0.0119 Epoch: 26 Global Step: 148860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:16,461-Speed 11058.38 samples/sec Loss 5.1473 LearningRate 0.0119 Epoch: 26 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:17,401-Speed 10905.14 samples/sec Loss 5.1160 LearningRate 0.0119 Epoch: 26 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:18,316-Speed 11196.77 samples/sec Loss 5.2450 LearningRate 0.0119 Epoch: 26 Global Step: 148890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:19,256-Speed 10906.41 samples/sec Loss 5.2026 LearningRate 0.0119 Epoch: 26 Global Step: 148900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:20,182-Speed 11064.85 samples/sec Loss 5.2015 LearningRate 0.0119 Epoch: 26 Global Step: 148910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:21,080-Speed 11408.01 samples/sec Loss 5.2511 LearningRate 0.0119 Epoch: 26 Global Step: 148920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:21,984-Speed 11337.65 samples/sec Loss 5.1747 LearningRate 0.0119 Epoch: 26 Global Step: 148930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:22,905-Speed 11129.69 samples/sec Loss 5.3367 LearningRate 0.0119 Epoch: 26 Global Step: 148940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:31:23,848-Speed 10864.15 samples/sec Loss 5.3678 LearningRate 0.0119 Epoch: 26 Global Step: 148950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:31:24,754-Speed 11318.58 samples/sec Loss 5.1388 LearningRate 0.0119 Epoch: 26 Global Step: 148960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:31:25,657-Speed 11342.86 samples/sec Loss 5.1513 LearningRate 0.0119 Epoch: 26 Global Step: 148970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:26,560-Speed 11344.24 samples/sec Loss 5.1429 LearningRate 0.0119 Epoch: 26 Global Step: 148980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:27,499-Speed 10914.64 samples/sec Loss 5.1496 LearningRate 0.0119 Epoch: 26 Global Step: 148990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:28,427-Speed 11041.29 samples/sec Loss 5.1549 LearningRate 0.0119 Epoch: 26 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:29,385-Speed 10695.16 samples/sec Loss 5.2152 LearningRate 0.0119 Epoch: 26 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:30,291-Speed 11318.52 samples/sec Loss 5.1877 LearningRate 0.0119 Epoch: 26 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:31,210-Speed 11149.12 samples/sec Loss 5.2172 LearningRate 0.0119 Epoch: 26 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:32,139-Speed 11032.15 samples/sec Loss 5.1227 LearningRate 0.0119 Epoch: 26 Global Step: 149040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:33,042-Speed 11342.17 samples/sec Loss 5.2341 LearningRate 0.0119 Epoch: 26 Global Step: 149050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:33,930-Speed 11540.78 samples/sec Loss 5.2334 LearningRate 0.0119 Epoch: 26 Global Step: 149060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:34,841-Speed 11255.61 samples/sec Loss 5.1373 LearningRate 0.0119 Epoch: 26 Global Step: 149070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:35,757-Speed 11175.83 samples/sec Loss 5.0706 LearningRate 0.0119 Epoch: 26 Global Step: 149080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:36,727-Speed 10570.90 samples/sec Loss 5.1485 LearningRate 0.0119 Epoch: 26 Global Step: 149090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:37,665-Speed 10915.97 samples/sec Loss 5.0901 LearningRate 0.0119 Epoch: 26 Global Step: 149100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:38,576-Speed 11247.28 samples/sec Loss 5.0561 LearningRate 0.0119 Epoch: 26 Global Step: 149110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:39,541-Speed 10625.66 samples/sec Loss 5.2329 LearningRate 0.0119 Epoch: 26 Global Step: 149120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:40,488-Speed 10822.18 samples/sec Loss 5.2607 LearningRate 0.0119 Epoch: 26 Global Step: 149130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:41,457-Speed 10573.20 samples/sec Loss 5.2233 LearningRate 0.0119 Epoch: 26 Global Step: 149140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:42,395-Speed 10924.10 samples/sec Loss 5.0566 LearningRate 0.0118 Epoch: 26 Global Step: 149150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:43,361-Speed 10605.87 samples/sec Loss 5.2478 LearningRate 0.0118 Epoch: 26 Global Step: 149160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:44,323-Speed 10656.06 samples/sec Loss 5.3358 LearningRate 0.0118 Epoch: 26 Global Step: 149170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:31:45,213-Speed 11515.83 samples/sec Loss 5.0022 LearningRate 0.0118 Epoch: 26 Global Step: 149180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:46,116-Speed 11350.66 samples/sec Loss 5.2501 LearningRate 0.0118 Epoch: 26 Global Step: 149190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:47,019-Speed 11345.16 samples/sec Loss 5.2778 LearningRate 0.0118 Epoch: 26 Global Step: 149200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:47,966-Speed 10828.79 samples/sec Loss 5.1674 LearningRate 0.0118 Epoch: 26 Global Step: 149210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:48,898-Speed 10986.02 samples/sec Loss 5.1198 LearningRate 0.0118 Epoch: 26 Global Step: 149220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:49,833-Speed 10968.14 samples/sec Loss 5.2649 LearningRate 0.0118 Epoch: 26 Global Step: 149230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:50,760-Speed 11046.54 samples/sec Loss 5.2419 LearningRate 0.0118 Epoch: 26 Global Step: 149240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:51,696-Speed 10949.93 samples/sec Loss 5.3081 LearningRate 0.0118 Epoch: 26 Global Step: 149250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:52,656-Speed 10672.30 samples/sec Loss 5.1709 LearningRate 0.0118 Epoch: 26 Global Step: 149260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:53,624-Speed 10580.73 samples/sec Loss 5.2676 LearningRate 0.0118 Epoch: 26 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:54,576-Speed 10767.81 samples/sec Loss 5.1693 LearningRate 0.0118 Epoch: 26 Global Step: 149280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:31:55,472-Speed 11442.81 samples/sec Loss 5.2616 LearningRate 0.0118 Epoch: 26 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:56,762-Speed 7939.98 samples/sec Loss 5.2712 LearningRate 0.0118 Epoch: 26 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:57,665-Speed 11344.12 samples/sec Loss 5.3927 LearningRate 0.0118 Epoch: 26 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:58,574-Speed 11279.49 samples/sec Loss 5.2737 LearningRate 0.0118 Epoch: 26 Global Step: 149320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:31:59,522-Speed 10804.66 samples/sec Loss 5.2291 LearningRate 0.0118 Epoch: 26 Global Step: 149330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:00,463-Speed 10895.27 samples/sec Loss 5.1817 LearningRate 0.0118 Epoch: 26 Global Step: 149340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:01,390-Speed 11053.81 samples/sec Loss 5.2065 LearningRate 0.0118 Epoch: 26 Global Step: 149350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:02,684-Speed 7916.19 samples/sec Loss 5.3390 LearningRate 0.0118 Epoch: 26 Global Step: 149360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:04,294-Speed 6362.34 samples/sec Loss 5.0948 LearningRate 0.0118 Epoch: 26 Global Step: 149370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:05,188-Speed 11468.60 samples/sec Loss 5.1708 LearningRate 0.0118 Epoch: 26 Global Step: 149380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:06,531-Speed 7632.03 samples/sec Loss 5.3408 LearningRate 0.0118 Epoch: 26 Global Step: 149390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:32:07,479-Speed 10798.42 samples/sec Loss 5.2870 LearningRate 0.0118 Epoch: 26 Global Step: 149400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:32:08,367-Speed 11539.29 samples/sec Loss 5.2014 LearningRate 0.0118 Epoch: 26 Global Step: 149410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:09,676-Speed 7829.43 samples/sec Loss 5.1940 LearningRate 0.0118 Epoch: 26 Global Step: 149420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:10,587-Speed 11246.31 samples/sec Loss 5.1816 LearningRate 0.0118 Epoch: 26 Global Step: 149430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:12,023-Speed 7135.24 samples/sec Loss 5.2119 LearningRate 0.0118 Epoch: 26 Global Step: 149440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:13,002-Speed 10461.60 samples/sec Loss 5.2251 LearningRate 0.0118 Epoch: 26 Global Step: 149450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:13,941-Speed 10918.20 samples/sec Loss 5.3108 LearningRate 0.0118 Epoch: 26 Global Step: 149460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:14,862-Speed 11127.78 samples/sec Loss 5.3944 LearningRate 0.0118 Epoch: 26 Global Step: 149470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:15,778-Speed 11181.05 samples/sec Loss 5.3512 LearningRate 0.0117 Epoch: 26 Global Step: 149480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:16,705-Speed 11052.57 samples/sec Loss 5.2744 LearningRate 0.0117 Epoch: 26 Global Step: 149490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:17,638-Speed 10986.35 samples/sec Loss 5.4214 LearningRate 0.0117 Epoch: 26 Global Step: 149500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:18,586-Speed 10806.53 samples/sec Loss 5.2829 LearningRate 0.0117 Epoch: 26 Global Step: 149510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:32:19,539-Speed 10753.74 samples/sec Loss 5.2896 LearningRate 0.0117 Epoch: 26 Global Step: 149520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:32:20,471-Speed 11007.60 samples/sec Loss 5.1314 LearningRate 0.0117 Epoch: 26 Global Step: 149530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:32:21,346-Speed 11702.29 samples/sec Loss 5.4127 LearningRate 0.0117 Epoch: 26 Global Step: 149540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:22,256-Speed 11265.82 samples/sec Loss 5.1798 LearningRate 0.0117 Epoch: 26 Global Step: 149550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:23,242-Speed 10388.16 samples/sec Loss 5.2127 LearningRate 0.0117 Epoch: 26 Global Step: 149560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:24,173-Speed 11011.49 samples/sec Loss 5.2090 LearningRate 0.0117 Epoch: 26 Global Step: 149570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:25,096-Speed 11100.39 samples/sec Loss 5.4539 LearningRate 0.0117 Epoch: 26 Global Step: 149580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:26,018-Speed 11114.62 samples/sec Loss 5.2234 LearningRate 0.0117 Epoch: 26 Global Step: 149590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:26,965-Speed 10818.34 samples/sec Loss 5.2801 LearningRate 0.0117 Epoch: 26 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:27,865-Speed 11386.89 samples/sec Loss 5.1887 LearningRate 0.0117 Epoch: 26 Global Step: 149610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:28,806-Speed 10892.78 samples/sec Loss 5.2512 LearningRate 0.0117 Epoch: 26 Global Step: 149620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:29,741-Speed 10956.44 samples/sec Loss 5.2054 LearningRate 0.0117 Epoch: 26 Global Step: 149630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:30,672-Speed 11001.71 samples/sec Loss 5.2563 LearningRate 0.0117 Epoch: 26 Global Step: 149640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:31,560-Speed 11539.99 samples/sec Loss 5.2224 LearningRate 0.0117 Epoch: 26 Global Step: 149650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:32,467-Speed 11301.10 samples/sec Loss 5.2340 LearningRate 0.0117 Epoch: 26 Global Step: 149660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:33,344-Speed 11680.55 samples/sec Loss 5.2370 LearningRate 0.0117 Epoch: 26 Global Step: 149670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:34,242-Speed 11418.19 samples/sec Loss 5.3089 LearningRate 0.0117 Epoch: 26 Global Step: 149680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:35,147-Speed 11318.04 samples/sec Loss 5.2158 LearningRate 0.0117 Epoch: 26 Global Step: 149690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:36,040-Speed 11477.55 samples/sec Loss 5.3777 LearningRate 0.0117 Epoch: 26 Global Step: 149700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:36,953-Speed 11255.85 samples/sec Loss 5.3810 LearningRate 0.0117 Epoch: 26 Global Step: 149710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:37,916-Speed 10632.43 samples/sec Loss 5.2631 LearningRate 0.0117 Epoch: 26 Global Step: 149720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:38,844-Speed 11045.93 samples/sec Loss 5.2910 LearningRate 0.0117 Epoch: 26 Global Step: 149730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:39,758-Speed 11215.73 samples/sec Loss 5.3194 LearningRate 0.0117 Epoch: 26 Global Step: 149740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:40,661-Speed 11344.64 samples/sec Loss 5.3433 LearningRate 0.0117 Epoch: 26 Global Step: 149750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:41,564-Speed 11348.62 samples/sec Loss 5.2884 LearningRate 0.0117 Epoch: 26 Global Step: 149760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:42,514-Speed 10788.74 samples/sec Loss 5.3774 LearningRate 0.0117 Epoch: 26 Global Step: 149770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:43,425-Speed 11242.06 samples/sec Loss 5.4184 LearningRate 0.0117 Epoch: 26 Global Step: 149780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:44,344-Speed 11160.89 samples/sec Loss 5.3501 LearningRate 0.0117 Epoch: 26 Global Step: 149790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:32:45,244-Speed 11376.61 samples/sec Loss 5.2446 LearningRate 0.0117 Epoch: 26 Global Step: 149800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:46,172-Speed 11046.92 samples/sec Loss 5.2676 LearningRate 0.0116 Epoch: 26 Global Step: 149810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:47,062-Speed 11513.93 samples/sec Loss 5.1573 LearningRate 0.0116 Epoch: 26 Global Step: 149820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:47,986-Speed 11085.79 samples/sec Loss 5.3053 LearningRate 0.0116 Epoch: 26 Global Step: 149830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:48,928-Speed 10880.44 samples/sec Loss 5.2751 LearningRate 0.0116 Epoch: 26 Global Step: 149840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:49,846-Speed 11159.60 samples/sec Loss 5.2094 LearningRate 0.0116 Epoch: 26 Global Step: 149850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:50,768-Speed 11115.90 samples/sec Loss 5.2812 LearningRate 0.0116 Epoch: 26 Global Step: 149860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:51,681-Speed 11223.48 samples/sec Loss 5.3428 LearningRate 0.0116 Epoch: 26 Global Step: 149870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:52,602-Speed 11128.55 samples/sec Loss 5.3624 LearningRate 0.0116 Epoch: 26 Global Step: 149880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:53,523-Speed 11121.84 samples/sec Loss 5.3999 LearningRate 0.0116 Epoch: 26 Global Step: 149890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:54,399-Speed 11731.15 samples/sec Loss 5.2640 LearningRate 0.0116 Epoch: 26 Global Step: 149900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:55,324-Speed 11070.92 samples/sec Loss 5.4053 LearningRate 0.0116 Epoch: 26 Global Step: 149910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:56,235-Speed 11256.11 samples/sec Loss 5.3756 LearningRate 0.0116 Epoch: 26 Global Step: 149920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:57,140-Speed 11315.14 samples/sec Loss 5.2016 LearningRate 0.0116 Epoch: 26 Global Step: 149930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:58,101-Speed 10662.32 samples/sec Loss 5.3052 LearningRate 0.0116 Epoch: 26 Global Step: 149940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:59,022-Speed 11126.57 samples/sec Loss 5.1962 LearningRate 0.0116 Epoch: 26 Global Step: 149950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:32:59,930-Speed 11297.74 samples/sec Loss 5.3095 LearningRate 0.0116 Epoch: 26 Global Step: 149960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:33:00,831-Speed 11373.81 samples/sec Loss 5.2676 LearningRate 0.0116 Epoch: 26 Global Step: 149970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:33:01,747-Speed 11185.54 samples/sec Loss 5.2992 LearningRate 0.0116 Epoch: 26 Global Step: 149980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:33:02,666-Speed 11147.97 samples/sec Loss 5.4613 LearningRate 0.0116 Epoch: 26 Global Step: 149990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:33:03,592-Speed 11066.44 samples/sec Loss 5.3480 LearningRate 0.0116 Epoch: 26 Global Step: 150000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:33:24,921-[lfw][150000]XNorm: 9.027069 Training: 2022-04-11 04:33:24,921-[lfw][150000]Accuracy-Flip: 0.99733+-0.00271 Training: 2022-04-11 04:33:24,922-[lfw][150000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:33:49,488-[cfp_fp][150000]XNorm: 7.666107 Training: 2022-04-11 04:33:49,489-[cfp_fp][150000]Accuracy-Flip: 0.94614+-0.01010 Training: 2022-04-11 04:33:49,489-[cfp_fp][150000]Accuracy-Highest: 0.95200 Training: 2022-04-11 04:34:10,807-[agedb_30][150000]XNorm: 8.865851 Training: 2022-04-11 04:34:10,807-[agedb_30][150000]Accuracy-Flip: 0.96883+-0.00869 Training: 2022-04-11 04:34:10,807-[agedb_30][150000]Accuracy-Highest: 0.96883 Training: 2022-04-11 04:34:11,717-Speed 150.31 samples/sec Loss 5.5993 LearningRate 0.0116 Epoch: 26 Global Step: 150010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:34:12,635-Speed 11163.44 samples/sec Loss 5.1393 LearningRate 0.0116 Epoch: 26 Global Step: 150020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:13,546-Speed 11241.20 samples/sec Loss 5.3797 LearningRate 0.0116 Epoch: 26 Global Step: 150030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:14,445-Speed 11403.41 samples/sec Loss 5.2195 LearningRate 0.0116 Epoch: 26 Global Step: 150040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:15,349-Speed 11333.30 samples/sec Loss 5.2516 LearningRate 0.0116 Epoch: 26 Global Step: 150050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:16,232-Speed 11611.82 samples/sec Loss 5.4801 LearningRate 0.0116 Epoch: 26 Global Step: 150060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:17,138-Speed 11310.73 samples/sec Loss 5.4540 LearningRate 0.0116 Epoch: 26 Global Step: 150070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:18,046-Speed 11276.59 samples/sec Loss 5.5093 LearningRate 0.0116 Epoch: 26 Global Step: 150080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:18,991-Speed 10840.48 samples/sec Loss 5.1878 LearningRate 0.0116 Epoch: 26 Global Step: 150090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:19,899-Speed 11295.00 samples/sec Loss 5.3611 LearningRate 0.0116 Epoch: 26 Global Step: 150100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:20,819-Speed 11131.97 samples/sec Loss 5.3297 LearningRate 0.0116 Epoch: 26 Global Step: 150110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:21,726-Speed 11302.63 samples/sec Loss 5.2560 LearningRate 0.0116 Epoch: 26 Global Step: 150120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:22,616-Speed 11513.01 samples/sec Loss 5.3868 LearningRate 0.0116 Epoch: 26 Global Step: 150130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:23,507-Speed 11497.43 samples/sec Loss 5.2907 LearningRate 0.0116 Epoch: 26 Global Step: 150140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:24,442-Speed 10962.09 samples/sec Loss 5.2063 LearningRate 0.0115 Epoch: 26 Global Step: 150150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:25,348-Speed 11306.89 samples/sec Loss 5.3170 LearningRate 0.0115 Epoch: 26 Global Step: 150160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:26,244-Speed 11444.12 samples/sec Loss 5.3415 LearningRate 0.0115 Epoch: 26 Global Step: 150170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:27,170-Speed 11062.55 samples/sec Loss 5.3063 LearningRate 0.0115 Epoch: 26 Global Step: 150180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:28,090-Speed 11139.58 samples/sec Loss 5.3404 LearningRate 0.0115 Epoch: 26 Global Step: 150190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:29,054-Speed 10623.58 samples/sec Loss 5.3045 LearningRate 0.0115 Epoch: 26 Global Step: 150200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:29,969-Speed 11198.82 samples/sec Loss 5.3420 LearningRate 0.0115 Epoch: 26 Global Step: 150210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:30,875-Speed 11321.22 samples/sec Loss 5.1351 LearningRate 0.0115 Epoch: 26 Global Step: 150220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:31,777-Speed 11358.92 samples/sec Loss 5.2751 LearningRate 0.0115 Epoch: 26 Global Step: 150230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:32,690-Speed 11219.97 samples/sec Loss 5.2471 LearningRate 0.0115 Epoch: 26 Global Step: 150240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:33,580-Speed 11512.39 samples/sec Loss 5.3617 LearningRate 0.0115 Epoch: 26 Global Step: 150250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:34,468-Speed 11539.34 samples/sec Loss 5.4280 LearningRate 0.0115 Epoch: 26 Global Step: 150260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:35,364-Speed 11434.31 samples/sec Loss 5.3358 LearningRate 0.0115 Epoch: 26 Global Step: 150270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:36,335-Speed 10551.21 samples/sec Loss 5.3608 LearningRate 0.0115 Epoch: 26 Global Step: 150280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:37,281-Speed 10830.12 samples/sec Loss 5.4517 LearningRate 0.0115 Epoch: 26 Global Step: 150290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:38,234-Speed 10758.68 samples/sec Loss 5.3180 LearningRate 0.0115 Epoch: 26 Global Step: 150300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:39,158-Speed 11086.25 samples/sec Loss 5.2670 LearningRate 0.0115 Epoch: 26 Global Step: 150310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:40,089-Speed 11007.80 samples/sec Loss 5.2485 LearningRate 0.0115 Epoch: 26 Global Step: 150320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:40,993-Speed 11348.85 samples/sec Loss 5.5691 LearningRate 0.0115 Epoch: 26 Global Step: 150330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:41,891-Speed 11400.79 samples/sec Loss 5.4271 LearningRate 0.0115 Epoch: 26 Global Step: 150340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:42,841-Speed 10790.41 samples/sec Loss 5.3050 LearningRate 0.0115 Epoch: 26 Global Step: 150350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:43,739-Speed 11416.55 samples/sec Loss 5.3192 LearningRate 0.0115 Epoch: 26 Global Step: 150360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:44,655-Speed 11187.00 samples/sec Loss 5.2426 LearningRate 0.0115 Epoch: 26 Global Step: 150370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:45,534-Speed 11651.18 samples/sec Loss 5.3286 LearningRate 0.0115 Epoch: 26 Global Step: 150380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:46,403-Speed 11794.25 samples/sec Loss 5.4057 LearningRate 0.0115 Epoch: 26 Global Step: 150390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:47,278-Speed 11715.31 samples/sec Loss 5.2877 LearningRate 0.0115 Epoch: 26 Global Step: 150400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:48,170-Speed 11487.13 samples/sec Loss 5.3237 LearningRate 0.0115 Epoch: 26 Global Step: 150410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:49,093-Speed 11103.13 samples/sec Loss 5.1917 LearningRate 0.0115 Epoch: 26 Global Step: 150420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:49,985-Speed 11486.58 samples/sec Loss 5.3454 LearningRate 0.0115 Epoch: 26 Global Step: 150430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:34:50,923-Speed 10920.63 samples/sec Loss 5.5213 LearningRate 0.0115 Epoch: 26 Global Step: 150440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:34:51,846-Speed 11110.02 samples/sec Loss 5.3966 LearningRate 0.0115 Epoch: 26 Global Step: 150450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:52,822-Speed 10500.73 samples/sec Loss 5.4002 LearningRate 0.0115 Epoch: 26 Global Step: 150460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:53,770-Speed 10803.03 samples/sec Loss 5.2572 LearningRate 0.0115 Epoch: 26 Global Step: 150470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:54,706-Speed 10942.90 samples/sec Loss 5.4503 LearningRate 0.0114 Epoch: 26 Global Step: 150480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:55,606-Speed 11388.37 samples/sec Loss 5.3214 LearningRate 0.0114 Epoch: 26 Global Step: 150490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:56,509-Speed 11347.89 samples/sec Loss 5.2515 LearningRate 0.0114 Epoch: 26 Global Step: 150500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:57,443-Speed 10976.96 samples/sec Loss 5.4693 LearningRate 0.0114 Epoch: 26 Global Step: 150510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:34:58,338-Speed 11447.58 samples/sec Loss 5.3872 LearningRate 0.0114 Epoch: 26 Global Step: 150520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:34:59,248-Speed 11261.75 samples/sec Loss 5.2995 LearningRate 0.0114 Epoch: 26 Global Step: 150530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:00,152-Speed 11340.71 samples/sec Loss 5.2318 LearningRate 0.0114 Epoch: 26 Global Step: 150540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:01,046-Speed 11453.83 samples/sec Loss 5.4933 LearningRate 0.0114 Epoch: 26 Global Step: 150550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:01,951-Speed 11323.28 samples/sec Loss 5.2815 LearningRate 0.0114 Epoch: 26 Global Step: 150560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:02,841-Speed 11522.13 samples/sec Loss 5.5152 LearningRate 0.0114 Epoch: 26 Global Step: 150570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:03,722-Speed 11630.66 samples/sec Loss 5.3669 LearningRate 0.0114 Epoch: 26 Global Step: 150580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:04,635-Speed 11225.43 samples/sec Loss 5.1840 LearningRate 0.0114 Epoch: 26 Global Step: 150590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:05,555-Speed 11136.21 samples/sec Loss 5.4771 LearningRate 0.0114 Epoch: 26 Global Step: 150600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:06,438-Speed 11604.03 samples/sec Loss 5.5244 LearningRate 0.0114 Epoch: 26 Global Step: 150610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:07,323-Speed 11579.93 samples/sec Loss 5.3671 LearningRate 0.0114 Epoch: 26 Global Step: 150620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:08,212-Speed 11523.95 samples/sec Loss 5.2995 LearningRate 0.0114 Epoch: 26 Global Step: 150630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:09,109-Speed 11431.11 samples/sec Loss 5.2843 LearningRate 0.0114 Epoch: 26 Global Step: 150640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:09,988-Speed 11652.34 samples/sec Loss 5.3972 LearningRate 0.0114 Epoch: 26 Global Step: 150650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:10,901-Speed 11232.85 samples/sec Loss 5.4225 LearningRate 0.0114 Epoch: 26 Global Step: 150660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:11,795-Speed 11453.18 samples/sec Loss 5.2840 LearningRate 0.0114 Epoch: 26 Global Step: 150670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:12,712-Speed 11176.54 samples/sec Loss 5.2454 LearningRate 0.0114 Epoch: 26 Global Step: 150680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:13,606-Speed 11467.57 samples/sec Loss 5.4080 LearningRate 0.0114 Epoch: 26 Global Step: 150690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:14,522-Speed 11183.10 samples/sec Loss 5.3343 LearningRate 0.0114 Epoch: 26 Global Step: 150700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:15,472-Speed 10787.41 samples/sec Loss 5.2884 LearningRate 0.0114 Epoch: 26 Global Step: 150710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:16,426-Speed 10741.10 samples/sec Loss 5.4220 LearningRate 0.0114 Epoch: 26 Global Step: 150720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:17,352-Speed 11065.91 samples/sec Loss 5.2112 LearningRate 0.0114 Epoch: 26 Global Step: 150730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:18,279-Speed 11052.30 samples/sec Loss 5.4130 LearningRate 0.0114 Epoch: 26 Global Step: 150740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:19,217-Speed 10921.18 samples/sec Loss 5.2905 LearningRate 0.0114 Epoch: 26 Global Step: 150750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:20,103-Speed 11566.54 samples/sec Loss 5.3186 LearningRate 0.0114 Epoch: 26 Global Step: 150760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:21,003-Speed 11383.57 samples/sec Loss 5.4818 LearningRate 0.0114 Epoch: 26 Global Step: 150770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:21,893-Speed 11524.72 samples/sec Loss 5.4367 LearningRate 0.0114 Epoch: 26 Global Step: 150780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:22,795-Speed 11348.80 samples/sec Loss 5.3697 LearningRate 0.0114 Epoch: 26 Global Step: 150790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:23,685-Speed 11525.86 samples/sec Loss 5.4532 LearningRate 0.0114 Epoch: 26 Global Step: 150800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:24,597-Speed 11231.52 samples/sec Loss 5.2782 LearningRate 0.0114 Epoch: 26 Global Step: 150810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:25,511-Speed 11221.47 samples/sec Loss 5.4071 LearningRate 0.0113 Epoch: 26 Global Step: 150820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:26,458-Speed 10810.49 samples/sec Loss 5.4004 LearningRate 0.0113 Epoch: 26 Global Step: 150830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:27,377-Speed 11154.30 samples/sec Loss 5.3950 LearningRate 0.0113 Epoch: 26 Global Step: 150840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:28,347-Speed 10568.42 samples/sec Loss 5.4002 LearningRate 0.0113 Epoch: 26 Global Step: 150850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:29,270-Speed 11093.22 samples/sec Loss 5.4073 LearningRate 0.0113 Epoch: 26 Global Step: 150860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:30,180-Speed 11271.69 samples/sec Loss 5.3926 LearningRate 0.0113 Epoch: 26 Global Step: 150870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:31,166-Speed 10390.91 samples/sec Loss 5.4019 LearningRate 0.0113 Epoch: 26 Global Step: 150880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:32,108-Speed 10874.62 samples/sec Loss 5.3445 LearningRate 0.0113 Epoch: 26 Global Step: 150890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:32,991-Speed 11611.25 samples/sec Loss 5.4285 LearningRate 0.0113 Epoch: 26 Global Step: 150900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:33,878-Speed 11545.30 samples/sec Loss 5.4426 LearningRate 0.0113 Epoch: 26 Global Step: 150910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:34,763-Speed 11572.65 samples/sec Loss 5.3609 LearningRate 0.0113 Epoch: 26 Global Step: 150920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:35,668-Speed 11330.69 samples/sec Loss 5.3071 LearningRate 0.0113 Epoch: 26 Global Step: 150930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:36,600-Speed 10990.71 samples/sec Loss 5.2588 LearningRate 0.0113 Epoch: 26 Global Step: 150940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:37,578-Speed 10485.63 samples/sec Loss 5.3972 LearningRate 0.0113 Epoch: 26 Global Step: 150950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:38,469-Speed 11500.78 samples/sec Loss 5.3149 LearningRate 0.0113 Epoch: 26 Global Step: 150960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:39,388-Speed 11153.66 samples/sec Loss 5.4490 LearningRate 0.0113 Epoch: 26 Global Step: 150970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:40,324-Speed 10946.40 samples/sec Loss 5.4387 LearningRate 0.0113 Epoch: 26 Global Step: 150980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:41,247-Speed 11111.33 samples/sec Loss 5.3435 LearningRate 0.0113 Epoch: 26 Global Step: 150990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:35:42,186-Speed 10915.22 samples/sec Loss 5.3434 LearningRate 0.0113 Epoch: 26 Global Step: 151000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:43,144-Speed 10692.79 samples/sec Loss 5.4452 LearningRate 0.0113 Epoch: 26 Global Step: 151010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:44,076-Speed 10995.62 samples/sec Loss 5.4504 LearningRate 0.0113 Epoch: 26 Global Step: 151020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:44,988-Speed 11236.69 samples/sec Loss 5.2844 LearningRate 0.0113 Epoch: 26 Global Step: 151030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:45,870-Speed 11613.79 samples/sec Loss 5.2875 LearningRate 0.0113 Epoch: 26 Global Step: 151040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:46,773-Speed 11349.96 samples/sec Loss 5.3604 LearningRate 0.0113 Epoch: 26 Global Step: 151050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:47,695-Speed 11110.84 samples/sec Loss 5.4352 LearningRate 0.0113 Epoch: 26 Global Step: 151060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:48,621-Speed 11068.62 samples/sec Loss 5.2943 LearningRate 0.0113 Epoch: 26 Global Step: 151070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:49,519-Speed 11412.49 samples/sec Loss 5.4157 LearningRate 0.0113 Epoch: 26 Global Step: 151080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:50,419-Speed 11393.27 samples/sec Loss 5.3386 LearningRate 0.0113 Epoch: 26 Global Step: 151090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:51,354-Speed 10952.41 samples/sec Loss 5.3471 LearningRate 0.0113 Epoch: 26 Global Step: 151100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:52,280-Speed 11065.10 samples/sec Loss 5.2139 LearningRate 0.0113 Epoch: 26 Global Step: 151110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:53,232-Speed 10769.87 samples/sec Loss 5.4197 LearningRate 0.0113 Epoch: 26 Global Step: 151120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:54,193-Speed 10659.96 samples/sec Loss 5.3841 LearningRate 0.0113 Epoch: 26 Global Step: 151130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:55,108-Speed 11206.95 samples/sec Loss 5.3190 LearningRate 0.0113 Epoch: 26 Global Step: 151140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:56,007-Speed 11389.99 samples/sec Loss 5.2636 LearningRate 0.0113 Epoch: 26 Global Step: 151150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:56,895-Speed 11542.91 samples/sec Loss 5.3286 LearningRate 0.0112 Epoch: 26 Global Step: 151160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:57,778-Speed 11601.10 samples/sec Loss 5.3500 LearningRate 0.0112 Epoch: 26 Global Step: 151170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:58,708-Speed 11021.16 samples/sec Loss 5.3313 LearningRate 0.0112 Epoch: 26 Global Step: 151180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:35:59,601-Speed 11475.34 samples/sec Loss 5.3666 LearningRate 0.0112 Epoch: 26 Global Step: 151190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:00,505-Speed 11338.54 samples/sec Loss 5.3768 LearningRate 0.0112 Epoch: 26 Global Step: 151200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:01,393-Speed 11534.25 samples/sec Loss 5.4284 LearningRate 0.0112 Epoch: 26 Global Step: 151210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:02,294-Speed 11376.13 samples/sec Loss 5.3246 LearningRate 0.0112 Epoch: 26 Global Step: 151220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:03,211-Speed 11171.44 samples/sec Loss 5.2999 LearningRate 0.0112 Epoch: 26 Global Step: 151230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:04,114-Speed 11344.87 samples/sec Loss 5.3283 LearningRate 0.0112 Epoch: 26 Global Step: 151240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:05,013-Speed 11403.40 samples/sec Loss 5.3897 LearningRate 0.0112 Epoch: 26 Global Step: 151250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:05,961-Speed 10808.79 samples/sec Loss 5.5297 LearningRate 0.0112 Epoch: 26 Global Step: 151260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:06,869-Speed 11292.67 samples/sec Loss 5.3939 LearningRate 0.0112 Epoch: 26 Global Step: 151270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:07,770-Speed 11365.06 samples/sec Loss 5.3787 LearningRate 0.0112 Epoch: 26 Global Step: 151280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:08,651-Speed 11634.37 samples/sec Loss 5.3414 LearningRate 0.0112 Epoch: 26 Global Step: 151290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:09,548-Speed 11422.51 samples/sec Loss 5.4008 LearningRate 0.0112 Epoch: 26 Global Step: 151300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:10,422-Speed 11728.97 samples/sec Loss 5.3554 LearningRate 0.0112 Epoch: 26 Global Step: 151310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:11,299-Speed 11685.98 samples/sec Loss 5.4770 LearningRate 0.0112 Epoch: 26 Global Step: 151320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:12,177-Speed 11662.27 samples/sec Loss 5.5380 LearningRate 0.0112 Epoch: 26 Global Step: 151330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:13,072-Speed 11454.86 samples/sec Loss 5.4122 LearningRate 0.0112 Epoch: 26 Global Step: 151340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:13,986-Speed 11207.51 samples/sec Loss 5.3726 LearningRate 0.0112 Epoch: 26 Global Step: 151350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:14,916-Speed 11016.01 samples/sec Loss 5.3459 LearningRate 0.0112 Epoch: 26 Global Step: 151360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:15,822-Speed 11314.14 samples/sec Loss 5.3235 LearningRate 0.0112 Epoch: 26 Global Step: 151370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:16,737-Speed 11198.30 samples/sec Loss 5.2445 LearningRate 0.0112 Epoch: 26 Global Step: 151380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:17,656-Speed 11150.68 samples/sec Loss 5.3490 LearningRate 0.0112 Epoch: 26 Global Step: 151390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:18,564-Speed 11296.06 samples/sec Loss 5.4897 LearningRate 0.0112 Epoch: 26 Global Step: 151400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:19,477-Speed 11222.73 samples/sec Loss 5.3967 LearningRate 0.0112 Epoch: 26 Global Step: 151410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:20,397-Speed 11139.20 samples/sec Loss 5.3804 LearningRate 0.0112 Epoch: 26 Global Step: 151420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:21,295-Speed 11413.70 samples/sec Loss 5.4424 LearningRate 0.0112 Epoch: 26 Global Step: 151430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:22,190-Speed 11438.33 samples/sec Loss 5.4572 LearningRate 0.0112 Epoch: 26 Global Step: 151440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:23,101-Speed 11247.37 samples/sec Loss 5.3511 LearningRate 0.0112 Epoch: 26 Global Step: 151450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:24,023-Speed 11117.46 samples/sec Loss 5.2308 LearningRate 0.0112 Epoch: 26 Global Step: 151460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:24,983-Speed 10674.34 samples/sec Loss 5.4533 LearningRate 0.0112 Epoch: 26 Global Step: 151470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:25,932-Speed 10799.15 samples/sec Loss 5.3960 LearningRate 0.0112 Epoch: 26 Global Step: 151480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:26,869-Speed 10934.84 samples/sec Loss 5.3634 LearningRate 0.0112 Epoch: 26 Global Step: 151490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:27,813-Speed 10854.55 samples/sec Loss 5.4186 LearningRate 0.0111 Epoch: 26 Global Step: 151500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:28,771-Speed 10696.10 samples/sec Loss 5.4830 LearningRate 0.0111 Epoch: 26 Global Step: 151510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:29,692-Speed 11123.96 samples/sec Loss 5.3581 LearningRate 0.0111 Epoch: 26 Global Step: 151520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:30,609-Speed 11177.07 samples/sec Loss 5.3665 LearningRate 0.0111 Epoch: 26 Global Step: 151530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:31,501-Speed 11496.16 samples/sec Loss 5.3635 LearningRate 0.0111 Epoch: 26 Global Step: 151540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:32,430-Speed 11033.23 samples/sec Loss 5.4181 LearningRate 0.0111 Epoch: 26 Global Step: 151550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:33,312-Speed 11612.28 samples/sec Loss 5.3158 LearningRate 0.0111 Epoch: 26 Global Step: 151560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:34,211-Speed 11405.31 samples/sec Loss 5.3924 LearningRate 0.0111 Epoch: 26 Global Step: 151570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:35,115-Speed 11331.46 samples/sec Loss 5.4112 LearningRate 0.0111 Epoch: 26 Global Step: 151580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:36,010-Speed 11456.08 samples/sec Loss 5.3893 LearningRate 0.0111 Epoch: 26 Global Step: 151590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:36,953-Speed 10865.54 samples/sec Loss 5.3765 LearningRate 0.0111 Epoch: 26 Global Step: 151600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:37,894-Speed 10878.49 samples/sec Loss 5.2995 LearningRate 0.0111 Epoch: 26 Global Step: 151610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:38,791-Speed 11425.91 samples/sec Loss 5.2958 LearningRate 0.0111 Epoch: 26 Global Step: 151620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:39,738-Speed 10825.57 samples/sec Loss 5.3690 LearningRate 0.0111 Epoch: 26 Global Step: 151630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:40,658-Speed 11132.63 samples/sec Loss 5.3865 LearningRate 0.0111 Epoch: 26 Global Step: 151640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:41,555-Speed 11424.06 samples/sec Loss 5.5153 LearningRate 0.0111 Epoch: 26 Global Step: 151650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:42,486-Speed 11015.87 samples/sec Loss 5.3587 LearningRate 0.0111 Epoch: 26 Global Step: 151660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:43,416-Speed 11006.64 samples/sec Loss 5.3496 LearningRate 0.0111 Epoch: 26 Global Step: 151670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:44,308-Speed 11498.98 samples/sec Loss 5.5106 LearningRate 0.0111 Epoch: 26 Global Step: 151680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:45,194-Speed 11569.28 samples/sec Loss 5.4596 LearningRate 0.0111 Epoch: 26 Global Step: 151690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:46,103-Speed 11276.71 samples/sec Loss 5.4237 LearningRate 0.0111 Epoch: 26 Global Step: 151700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:47,053-Speed 10776.55 samples/sec Loss 5.3130 LearningRate 0.0111 Epoch: 26 Global Step: 151710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:47,990-Speed 10946.88 samples/sec Loss 5.4668 LearningRate 0.0111 Epoch: 26 Global Step: 151720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:48,921-Speed 11003.15 samples/sec Loss 5.2840 LearningRate 0.0111 Epoch: 26 Global Step: 151730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:49,869-Speed 10804.27 samples/sec Loss 5.4092 LearningRate 0.0111 Epoch: 26 Global Step: 151740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:50,780-Speed 11254.79 samples/sec Loss 5.5137 LearningRate 0.0111 Epoch: 26 Global Step: 151750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:51,723-Speed 10860.92 samples/sec Loss 5.5597 LearningRate 0.0111 Epoch: 26 Global Step: 151760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:52,651-Speed 11040.43 samples/sec Loss 5.3410 LearningRate 0.0111 Epoch: 26 Global Step: 151770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:53,597-Speed 10837.03 samples/sec Loss 5.4349 LearningRate 0.0111 Epoch: 26 Global Step: 151780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:54,574-Speed 10491.03 samples/sec Loss 5.3948 LearningRate 0.0111 Epoch: 26 Global Step: 151790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:55,453-Speed 11651.32 samples/sec Loss 5.4372 LearningRate 0.0111 Epoch: 26 Global Step: 151800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:36:56,376-Speed 11104.22 samples/sec Loss 5.4058 LearningRate 0.0111 Epoch: 26 Global Step: 151810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:57,302-Speed 11071.14 samples/sec Loss 5.4036 LearningRate 0.0111 Epoch: 26 Global Step: 151820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:58,225-Speed 11097.59 samples/sec Loss 5.3715 LearningRate 0.0111 Epoch: 26 Global Step: 151830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:36:59,129-Speed 11337.24 samples/sec Loss 5.3282 LearningRate 0.0110 Epoch: 26 Global Step: 151840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:00,061-Speed 10996.76 samples/sec Loss 5.3132 LearningRate 0.0110 Epoch: 26 Global Step: 151850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:00,998-Speed 10939.85 samples/sec Loss 5.2621 LearningRate 0.0110 Epoch: 26 Global Step: 151860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:01,906-Speed 11274.82 samples/sec Loss 5.3914 LearningRate 0.0110 Epoch: 26 Global Step: 151870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:02,869-Speed 10649.53 samples/sec Loss 5.5558 LearningRate 0.0110 Epoch: 26 Global Step: 151880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:03,832-Speed 10639.07 samples/sec Loss 5.3586 LearningRate 0.0110 Epoch: 26 Global Step: 151890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:04,734-Speed 11353.90 samples/sec Loss 5.3886 LearningRate 0.0110 Epoch: 26 Global Step: 151900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:05,648-Speed 11216.58 samples/sec Loss 5.3342 LearningRate 0.0110 Epoch: 26 Global Step: 151910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:06,542-Speed 11464.52 samples/sec Loss 5.4205 LearningRate 0.0110 Epoch: 26 Global Step: 151920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:07,468-Speed 11072.89 samples/sec Loss 5.4812 LearningRate 0.0110 Epoch: 26 Global Step: 151930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:08,403-Speed 10958.11 samples/sec Loss 5.3874 LearningRate 0.0110 Epoch: 26 Global Step: 151940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:09,302-Speed 11396.17 samples/sec Loss 5.4575 LearningRate 0.0110 Epoch: 26 Global Step: 151950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:10,204-Speed 11355.67 samples/sec Loss 5.4914 LearningRate 0.0110 Epoch: 26 Global Step: 151960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:11,100-Speed 11438.65 samples/sec Loss 5.2897 LearningRate 0.0110 Epoch: 26 Global Step: 151970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:11,984-Speed 11597.67 samples/sec Loss 5.4262 LearningRate 0.0110 Epoch: 26 Global Step: 151980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:37:12,869-Speed 11569.20 samples/sec Loss 5.3551 LearningRate 0.0110 Epoch: 26 Global Step: 151990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:13,773-Speed 11338.66 samples/sec Loss 5.2651 LearningRate 0.0110 Epoch: 26 Global Step: 152000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:37:35,066-[lfw][152000]XNorm: 8.813320 Training: 2022-04-11 04:37:35,066-[lfw][152000]Accuracy-Flip: 0.99600+-0.00271 Training: 2022-04-11 04:37:35,067-[lfw][152000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:37:59,773-[cfp_fp][152000]XNorm: 7.487304 Training: 2022-04-11 04:37:59,774-[cfp_fp][152000]Accuracy-Flip: 0.94714+-0.01207 Training: 2022-04-11 04:37:59,774-[cfp_fp][152000]Accuracy-Highest: 0.95200 Training: 2022-04-11 04:38:21,157-[agedb_30][152000]XNorm: 8.655928 Training: 2022-04-11 04:38:21,158-[agedb_30][152000]Accuracy-Flip: 0.96733+-0.00616 Training: 2022-04-11 04:38:21,159-[agedb_30][152000]Accuracy-Highest: 0.96883 Training: 2022-04-11 04:38:22,064-Speed 149.95 samples/sec Loss 5.2964 LearningRate 0.0110 Epoch: 26 Global Step: 152010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:22,960-Speed 11444.88 samples/sec Loss 5.3807 LearningRate 0.0110 Epoch: 26 Global Step: 152020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:23,884-Speed 11090.65 samples/sec Loss 5.4799 LearningRate 0.0110 Epoch: 26 Global Step: 152030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:24,793-Speed 11272.94 samples/sec Loss 5.3858 LearningRate 0.0110 Epoch: 26 Global Step: 152040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:25,751-Speed 10698.73 samples/sec Loss 5.4780 LearningRate 0.0110 Epoch: 26 Global Step: 152050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:26,699-Speed 10802.86 samples/sec Loss 5.4780 LearningRate 0.0110 Epoch: 26 Global Step: 152060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:27,668-Speed 10582.37 samples/sec Loss 5.3152 LearningRate 0.0110 Epoch: 26 Global Step: 152070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:28,585-Speed 11169.25 samples/sec Loss 5.4556 LearningRate 0.0110 Epoch: 26 Global Step: 152080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:29,485-Speed 11388.76 samples/sec Loss 5.2889 LearningRate 0.0110 Epoch: 26 Global Step: 152090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:30,399-Speed 11209.37 samples/sec Loss 5.4045 LearningRate 0.0110 Epoch: 26 Global Step: 152100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:31,285-Speed 11573.74 samples/sec Loss 5.4482 LearningRate 0.0110 Epoch: 26 Global Step: 152110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:32,208-Speed 11092.54 samples/sec Loss 5.4852 LearningRate 0.0110 Epoch: 26 Global Step: 152120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:33,091-Speed 11610.48 samples/sec Loss 5.3577 LearningRate 0.0110 Epoch: 26 Global Step: 152130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:33,968-Speed 11676.95 samples/sec Loss 5.3464 LearningRate 0.0110 Epoch: 26 Global Step: 152140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:34,846-Speed 11682.06 samples/sec Loss 5.2561 LearningRate 0.0110 Epoch: 26 Global Step: 152150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:35,728-Speed 11627.25 samples/sec Loss 5.4769 LearningRate 0.0110 Epoch: 26 Global Step: 152160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:36,656-Speed 11047.57 samples/sec Loss 5.4296 LearningRate 0.0110 Epoch: 26 Global Step: 152170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:37,540-Speed 11591.29 samples/sec Loss 5.5271 LearningRate 0.0109 Epoch: 26 Global Step: 152180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:38,465-Speed 11081.79 samples/sec Loss 5.4858 LearningRate 0.0109 Epoch: 26 Global Step: 152190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:39,407-Speed 10871.35 samples/sec Loss 5.4769 LearningRate 0.0109 Epoch: 26 Global Step: 152200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:40,316-Speed 11278.44 samples/sec Loss 5.4440 LearningRate 0.0109 Epoch: 26 Global Step: 152210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:41,194-Speed 11667.44 samples/sec Loss 5.4267 LearningRate 0.0109 Epoch: 26 Global Step: 152220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:42,124-Speed 11015.34 samples/sec Loss 5.5669 LearningRate 0.0109 Epoch: 26 Global Step: 152230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:43,059-Speed 10960.95 samples/sec Loss 5.4875 LearningRate 0.0109 Epoch: 26 Global Step: 152240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:43,943-Speed 11598.84 samples/sec Loss 5.3988 LearningRate 0.0109 Epoch: 26 Global Step: 152250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:44,838-Speed 11450.26 samples/sec Loss 5.3375 LearningRate 0.0109 Epoch: 26 Global Step: 152260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:45,737-Speed 11402.28 samples/sec Loss 5.4088 LearningRate 0.0109 Epoch: 26 Global Step: 152270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:46,623-Speed 11559.84 samples/sec Loss 5.3899 LearningRate 0.0109 Epoch: 26 Global Step: 152280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:47,515-Speed 11480.99 samples/sec Loss 5.2916 LearningRate 0.0109 Epoch: 26 Global Step: 152290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:48,434-Speed 11158.88 samples/sec Loss 5.2682 LearningRate 0.0109 Epoch: 26 Global Step: 152300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:49,358-Speed 11081.60 samples/sec Loss 5.3741 LearningRate 0.0109 Epoch: 26 Global Step: 152310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:50,233-Speed 11712.17 samples/sec Loss 5.4879 LearningRate 0.0109 Epoch: 26 Global Step: 152320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:51,118-Speed 11586.77 samples/sec Loss 5.3372 LearningRate 0.0109 Epoch: 26 Global Step: 152330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:52,047-Speed 11029.65 samples/sec Loss 5.4070 LearningRate 0.0109 Epoch: 26 Global Step: 152340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:52,967-Speed 11135.96 samples/sec Loss 5.3450 LearningRate 0.0109 Epoch: 26 Global Step: 152350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:53,903-Speed 10951.50 samples/sec Loss 5.3495 LearningRate 0.0109 Epoch: 26 Global Step: 152360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:54,836-Speed 10977.35 samples/sec Loss 5.3255 LearningRate 0.0109 Epoch: 26 Global Step: 152370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:55,752-Speed 11196.46 samples/sec Loss 5.2699 LearningRate 0.0109 Epoch: 26 Global Step: 152380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:38:56,698-Speed 10824.69 samples/sec Loss 5.4402 LearningRate 0.0109 Epoch: 26 Global Step: 152390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:57,602-Speed 11337.06 samples/sec Loss 5.4279 LearningRate 0.0109 Epoch: 26 Global Step: 152400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:58,492-Speed 11510.63 samples/sec Loss 5.3520 LearningRate 0.0109 Epoch: 26 Global Step: 152410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:38:59,407-Speed 11209.51 samples/sec Loss 5.4309 LearningRate 0.0109 Epoch: 26 Global Step: 152420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:00,323-Speed 11185.75 samples/sec Loss 5.3527 LearningRate 0.0109 Epoch: 26 Global Step: 152430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:01,221-Speed 11414.88 samples/sec Loss 5.4193 LearningRate 0.0109 Epoch: 26 Global Step: 152440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:02,152-Speed 10997.44 samples/sec Loss 5.4144 LearningRate 0.0109 Epoch: 26 Global Step: 152450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:03,054-Speed 11357.24 samples/sec Loss 5.4085 LearningRate 0.0109 Epoch: 26 Global Step: 152460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:03,961-Speed 11297.33 samples/sec Loss 5.3346 LearningRate 0.0109 Epoch: 26 Global Step: 152470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:04,899-Speed 10930.10 samples/sec Loss 5.3421 LearningRate 0.0109 Epoch: 26 Global Step: 152480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:05,812-Speed 11221.58 samples/sec Loss 5.3994 LearningRate 0.0109 Epoch: 26 Global Step: 152490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:06,711-Speed 11398.97 samples/sec Loss 5.3884 LearningRate 0.0109 Epoch: 26 Global Step: 152500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:07,610-Speed 11397.89 samples/sec Loss 5.5055 LearningRate 0.0109 Epoch: 26 Global Step: 152510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:08,543-Speed 10988.94 samples/sec Loss 5.5072 LearningRate 0.0109 Epoch: 26 Global Step: 152520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:09,451-Speed 11289.64 samples/sec Loss 5.4873 LearningRate 0.0108 Epoch: 26 Global Step: 152530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:10,337-Speed 11570.71 samples/sec Loss 5.3235 LearningRate 0.0108 Epoch: 26 Global Step: 152540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:11,223-Speed 11554.10 samples/sec Loss 5.4058 LearningRate 0.0108 Epoch: 26 Global Step: 152550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:12,144-Speed 11136.01 samples/sec Loss 5.4574 LearningRate 0.0108 Epoch: 26 Global Step: 152560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:13,042-Speed 11411.20 samples/sec Loss 5.2796 LearningRate 0.0108 Epoch: 26 Global Step: 152570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:13,976-Speed 10969.12 samples/sec Loss 5.4276 LearningRate 0.0108 Epoch: 26 Global Step: 152580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:14,878-Speed 11362.47 samples/sec Loss 5.3065 LearningRate 0.0108 Epoch: 26 Global Step: 152590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:39:15,752-Speed 11724.43 samples/sec Loss 5.4011 LearningRate 0.0108 Epoch: 26 Global Step: 152600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:16,636-Speed 11594.13 samples/sec Loss 5.3960 LearningRate 0.0108 Epoch: 26 Global Step: 152610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:17,523-Speed 11553.61 samples/sec Loss 5.4850 LearningRate 0.0108 Epoch: 26 Global Step: 152620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:18,429-Speed 11304.46 samples/sec Loss 5.3040 LearningRate 0.0108 Epoch: 26 Global Step: 152630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:19,347-Speed 11159.47 samples/sec Loss 5.3456 LearningRate 0.0108 Epoch: 26 Global Step: 152640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:20,311-Speed 10638.45 samples/sec Loss 5.4700 LearningRate 0.0108 Epoch: 26 Global Step: 152650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:21,209-Speed 11405.62 samples/sec Loss 5.2889 LearningRate 0.0108 Epoch: 26 Global Step: 152660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:22,114-Speed 11322.15 samples/sec Loss 5.4417 LearningRate 0.0108 Epoch: 26 Global Step: 152670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:23,013-Speed 11395.17 samples/sec Loss 5.4289 LearningRate 0.0108 Epoch: 26 Global Step: 152680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:23,896-Speed 11617.00 samples/sec Loss 5.5554 LearningRate 0.0108 Epoch: 26 Global Step: 152690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:24,798-Speed 11352.56 samples/sec Loss 5.5494 LearningRate 0.0108 Epoch: 26 Global Step: 152700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:39:25,709-Speed 11249.41 samples/sec Loss 5.3354 LearningRate 0.0108 Epoch: 26 Global Step: 152710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:26,630-Speed 11137.65 samples/sec Loss 5.4341 LearningRate 0.0108 Epoch: 26 Global Step: 152720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:27,584-Speed 10734.99 samples/sec Loss 5.4621 LearningRate 0.0108 Epoch: 26 Global Step: 152730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:28,517-Speed 10980.10 samples/sec Loss 5.4280 LearningRate 0.0108 Epoch: 26 Global Step: 152740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:29,474-Speed 10705.88 samples/sec Loss 5.3671 LearningRate 0.0108 Epoch: 26 Global Step: 152750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:30,368-Speed 11470.80 samples/sec Loss 5.3878 LearningRate 0.0108 Epoch: 26 Global Step: 152760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:31,262-Speed 11455.58 samples/sec Loss 5.3994 LearningRate 0.0108 Epoch: 26 Global Step: 152770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:32,152-Speed 11519.15 samples/sec Loss 5.5021 LearningRate 0.0108 Epoch: 26 Global Step: 152780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:33,024-Speed 11748.66 samples/sec Loss 5.3879 LearningRate 0.0108 Epoch: 26 Global Step: 152790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:33,912-Speed 11538.65 samples/sec Loss 5.4818 LearningRate 0.0108 Epoch: 26 Global Step: 152800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:34,799-Speed 11554.60 samples/sec Loss 5.4131 LearningRate 0.0108 Epoch: 26 Global Step: 152810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:39:35,706-Speed 11299.87 samples/sec Loss 5.3681 LearningRate 0.0108 Epoch: 26 Global Step: 152820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:36,586-Speed 11641.52 samples/sec Loss 5.3609 LearningRate 0.0108 Epoch: 26 Global Step: 152830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:37,499-Speed 11222.58 samples/sec Loss 5.4107 LearningRate 0.0108 Epoch: 26 Global Step: 152840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:38,394-Speed 11460.79 samples/sec Loss 5.3099 LearningRate 0.0108 Epoch: 26 Global Step: 152850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:39,327-Speed 10978.74 samples/sec Loss 5.4519 LearningRate 0.0108 Epoch: 26 Global Step: 152860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:40,289-Speed 10648.12 samples/sec Loss 5.5635 LearningRate 0.0107 Epoch: 26 Global Step: 152870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:41,266-Speed 10492.53 samples/sec Loss 5.5670 LearningRate 0.0107 Epoch: 26 Global Step: 152880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:42,179-Speed 11225.93 samples/sec Loss 5.5188 LearningRate 0.0107 Epoch: 26 Global Step: 152890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:43,110-Speed 11002.94 samples/sec Loss 5.4891 LearningRate 0.0107 Epoch: 26 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:44,011-Speed 11369.75 samples/sec Loss 5.5163 LearningRate 0.0107 Epoch: 26 Global Step: 152910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:44,902-Speed 11510.74 samples/sec Loss 5.4626 LearningRate 0.0107 Epoch: 26 Global Step: 152920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:39:45,784-Speed 11616.48 samples/sec Loss 5.2751 LearningRate 0.0107 Epoch: 26 Global Step: 152930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:46,665-Speed 11630.51 samples/sec Loss 5.2697 LearningRate 0.0107 Epoch: 26 Global Step: 152940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:47,570-Speed 11327.93 samples/sec Loss 5.4498 LearningRate 0.0107 Epoch: 26 Global Step: 152950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:48,489-Speed 11151.14 samples/sec Loss 5.3413 LearningRate 0.0107 Epoch: 26 Global Step: 152960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:49,389-Speed 11374.10 samples/sec Loss 5.3492 LearningRate 0.0107 Epoch: 26 Global Step: 152970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:50,310-Speed 11132.28 samples/sec Loss 5.4404 LearningRate 0.0107 Epoch: 26 Global Step: 152980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:51,224-Speed 11206.17 samples/sec Loss 5.3869 LearningRate 0.0107 Epoch: 26 Global Step: 152990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:52,140-Speed 11195.36 samples/sec Loss 5.3549 LearningRate 0.0107 Epoch: 26 Global Step: 153000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:53,071-Speed 11004.64 samples/sec Loss 5.4644 LearningRate 0.0107 Epoch: 26 Global Step: 153010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:53,997-Speed 11069.73 samples/sec Loss 5.4978 LearningRate 0.0107 Epoch: 26 Global Step: 153020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:54,940-Speed 10861.74 samples/sec Loss 5.4330 LearningRate 0.0107 Epoch: 26 Global Step: 153030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:55,873-Speed 10990.49 samples/sec Loss 5.4536 LearningRate 0.0107 Epoch: 26 Global Step: 153040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:56,781-Speed 11292.13 samples/sec Loss 5.4965 LearningRate 0.0107 Epoch: 26 Global Step: 153050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:57,655-Speed 11720.08 samples/sec Loss 5.5388 LearningRate 0.0107 Epoch: 26 Global Step: 153060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:39:58,542-Speed 11550.16 samples/sec Loss 5.4020 LearningRate 0.0107 Epoch: 26 Global Step: 153070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:39:59,451-Speed 11275.21 samples/sec Loss 5.4630 LearningRate 0.0107 Epoch: 26 Global Step: 153080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:00,348-Speed 11425.87 samples/sec Loss 5.3994 LearningRate 0.0107 Epoch: 26 Global Step: 153090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:01,284-Speed 10941.62 samples/sec Loss 5.2804 LearningRate 0.0107 Epoch: 26 Global Step: 153100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:02,226-Speed 10887.14 samples/sec Loss 5.4488 LearningRate 0.0107 Epoch: 26 Global Step: 153110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:03,138-Speed 11228.40 samples/sec Loss 5.3686 LearningRate 0.0107 Epoch: 26 Global Step: 153120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:04,030-Speed 11494.82 samples/sec Loss 5.4294 LearningRate 0.0107 Epoch: 26 Global Step: 153130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:04,903-Speed 11730.62 samples/sec Loss 5.4397 LearningRate 0.0107 Epoch: 26 Global Step: 153140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:05,802-Speed 11403.59 samples/sec Loss 5.4727 LearningRate 0.0107 Epoch: 26 Global Step: 153150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:06,707-Speed 11319.04 samples/sec Loss 5.3787 LearningRate 0.0107 Epoch: 26 Global Step: 153160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:07,665-Speed 10701.84 samples/sec Loss 5.4552 LearningRate 0.0107 Epoch: 26 Global Step: 153170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:08,535-Speed 11778.73 samples/sec Loss 5.4073 LearningRate 0.0107 Epoch: 26 Global Step: 153180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:09,411-Speed 11690.46 samples/sec Loss 5.4005 LearningRate 0.0107 Epoch: 26 Global Step: 153190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:10,307-Speed 11444.38 samples/sec Loss 5.3779 LearningRate 0.0107 Epoch: 26 Global Step: 153200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:11,191-Speed 11594.85 samples/sec Loss 5.4313 LearningRate 0.0107 Epoch: 26 Global Step: 153210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:12,089-Speed 11404.23 samples/sec Loss 5.3191 LearningRate 0.0106 Epoch: 26 Global Step: 153220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:12,991-Speed 11362.62 samples/sec Loss 5.4690 LearningRate 0.0106 Epoch: 26 Global Step: 153230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:13,901-Speed 11263.74 samples/sec Loss 5.3989 LearningRate 0.0106 Epoch: 26 Global Step: 153240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:14,781-Speed 11641.28 samples/sec Loss 5.3114 LearningRate 0.0106 Epoch: 26 Global Step: 153250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:15,704-Speed 11095.72 samples/sec Loss 5.4187 LearningRate 0.0106 Epoch: 26 Global Step: 153260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:16,647-Speed 10870.12 samples/sec Loss 5.4700 LearningRate 0.0106 Epoch: 26 Global Step: 153270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:17,553-Speed 11313.33 samples/sec Loss 5.3656 LearningRate 0.0106 Epoch: 26 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:18,453-Speed 11378.82 samples/sec Loss 5.5384 LearningRate 0.0106 Epoch: 26 Global Step: 153290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:19,351-Speed 11420.95 samples/sec Loss 5.4569 LearningRate 0.0106 Epoch: 26 Global Step: 153300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:20,255-Speed 11335.39 samples/sec Loss 5.3016 LearningRate 0.0106 Epoch: 26 Global Step: 153310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:21,148-Speed 11464.96 samples/sec Loss 5.3501 LearningRate 0.0106 Epoch: 26 Global Step: 153320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:22,042-Speed 11467.84 samples/sec Loss 5.4434 LearningRate 0.0106 Epoch: 26 Global Step: 153330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:22,959-Speed 11174.12 samples/sec Loss 5.2194 LearningRate 0.0106 Epoch: 26 Global Step: 153340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:23,894-Speed 10963.54 samples/sec Loss 5.5117 LearningRate 0.0106 Epoch: 26 Global Step: 153350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:24,838-Speed 10848.94 samples/sec Loss 5.4580 LearningRate 0.0106 Epoch: 26 Global Step: 153360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:25,760-Speed 11117.62 samples/sec Loss 5.4563 LearningRate 0.0106 Epoch: 26 Global Step: 153370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:26,664-Speed 11327.58 samples/sec Loss 5.4524 LearningRate 0.0106 Epoch: 26 Global Step: 153380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:27,592-Speed 11050.98 samples/sec Loss 5.3468 LearningRate 0.0106 Epoch: 26 Global Step: 153390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:28,519-Speed 11044.55 samples/sec Loss 5.4803 LearningRate 0.0106 Epoch: 26 Global Step: 153400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:29,452-Speed 10995.89 samples/sec Loss 5.3438 LearningRate 0.0106 Epoch: 26 Global Step: 153410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:30,434-Speed 10435.95 samples/sec Loss 5.3594 LearningRate 0.0106 Epoch: 26 Global Step: 153420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:31,371-Speed 10939.15 samples/sec Loss 5.4133 LearningRate 0.0106 Epoch: 26 Global Step: 153430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:32,285-Speed 11208.62 samples/sec Loss 5.4584 LearningRate 0.0106 Epoch: 26 Global Step: 153440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:33,180-Speed 11440.81 samples/sec Loss 5.3986 LearningRate 0.0106 Epoch: 26 Global Step: 153450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:40:34,086-Speed 11319.73 samples/sec Loss 5.4746 LearningRate 0.0106 Epoch: 26 Global Step: 153460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:34,957-Speed 11759.75 samples/sec Loss 5.3938 LearningRate 0.0106 Epoch: 26 Global Step: 153470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:35,845-Speed 11587.38 samples/sec Loss 5.4841 LearningRate 0.0106 Epoch: 26 Global Step: 153480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:36,755-Speed 11259.02 samples/sec Loss 5.3056 LearningRate 0.0106 Epoch: 26 Global Step: 153490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:37,692-Speed 10931.21 samples/sec Loss 5.3996 LearningRate 0.0106 Epoch: 26 Global Step: 153500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:38,658-Speed 10615.67 samples/sec Loss 5.4688 LearningRate 0.0106 Epoch: 26 Global Step: 153510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:39,677-Speed 10050.98 samples/sec Loss 5.4382 LearningRate 0.0106 Epoch: 26 Global Step: 153520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:50,173-Speed 975.69 samples/sec Loss 4.9098 LearningRate 0.0106 Epoch: 27 Global Step: 153530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:51,120-Speed 10831.04 samples/sec Loss 4.8120 LearningRate 0.0106 Epoch: 27 Global Step: 153540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:52,059-Speed 10913.45 samples/sec Loss 4.7987 LearningRate 0.0106 Epoch: 27 Global Step: 153550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:53,021-Speed 10646.76 samples/sec Loss 4.8437 LearningRate 0.0106 Epoch: 27 Global Step: 153560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:40:53,976-Speed 10733.48 samples/sec Loss 4.7118 LearningRate 0.0105 Epoch: 27 Global Step: 153570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:40:54,870-Speed 11458.02 samples/sec Loss 4.7513 LearningRate 0.0105 Epoch: 27 Global Step: 153580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:55,792-Speed 11123.36 samples/sec Loss 4.9266 LearningRate 0.0105 Epoch: 27 Global Step: 153590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:56,735-Speed 10862.28 samples/sec Loss 4.7606 LearningRate 0.0105 Epoch: 27 Global Step: 153600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:57,649-Speed 11211.50 samples/sec Loss 4.7792 LearningRate 0.0105 Epoch: 27 Global Step: 153610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:58,566-Speed 11177.73 samples/sec Loss 4.8267 LearningRate 0.0105 Epoch: 27 Global Step: 153620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:40:59,514-Speed 10806.96 samples/sec Loss 4.8748 LearningRate 0.0105 Epoch: 27 Global Step: 153630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:00,431-Speed 11196.26 samples/sec Loss 4.8804 LearningRate 0.0105 Epoch: 27 Global Step: 153640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:01,343-Speed 11240.60 samples/sec Loss 4.7833 LearningRate 0.0105 Epoch: 27 Global Step: 153650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:02,265-Speed 11104.91 samples/sec Loss 4.9525 LearningRate 0.0105 Epoch: 27 Global Step: 153660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:03,173-Speed 11290.13 samples/sec Loss 4.8129 LearningRate 0.0105 Epoch: 27 Global Step: 153670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:04,067-Speed 11460.71 samples/sec Loss 4.8758 LearningRate 0.0105 Epoch: 27 Global Step: 153680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:41:04,985-Speed 11161.41 samples/sec Loss 4.9285 LearningRate 0.0105 Epoch: 27 Global Step: 153690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:41:05,897-Speed 11232.23 samples/sec Loss 4.7965 LearningRate 0.0105 Epoch: 27 Global Step: 153700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:41:06,941-Speed 9813.96 samples/sec Loss 4.8886 LearningRate 0.0105 Epoch: 27 Global Step: 153710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:41:07,885-Speed 10855.66 samples/sec Loss 4.8704 LearningRate 0.0105 Epoch: 27 Global Step: 153720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:08,851-Speed 10606.14 samples/sec Loss 4.8016 LearningRate 0.0105 Epoch: 27 Global Step: 153730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:09,788-Speed 10933.43 samples/sec Loss 4.8945 LearningRate 0.0105 Epoch: 27 Global Step: 153740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:10,708-Speed 11148.84 samples/sec Loss 4.8611 LearningRate 0.0105 Epoch: 27 Global Step: 153750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:11,613-Speed 11319.85 samples/sec Loss 4.7777 LearningRate 0.0105 Epoch: 27 Global Step: 153760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:12,586-Speed 10537.28 samples/sec Loss 4.8449 LearningRate 0.0105 Epoch: 27 Global Step: 153770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:13,526-Speed 10901.37 samples/sec Loss 4.9818 LearningRate 0.0105 Epoch: 27 Global Step: 153780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:14,490-Speed 10632.13 samples/sec Loss 4.7650 LearningRate 0.0105 Epoch: 27 Global Step: 153790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:15,423-Speed 10983.51 samples/sec Loss 5.0376 LearningRate 0.0105 Epoch: 27 Global Step: 153800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:16,354-Speed 11001.25 samples/sec Loss 4.8493 LearningRate 0.0105 Epoch: 27 Global Step: 153810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:17,262-Speed 11291.93 samples/sec Loss 4.9794 LearningRate 0.0105 Epoch: 27 Global Step: 153820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:18,190-Speed 11041.29 samples/sec Loss 4.9487 LearningRate 0.0105 Epoch: 27 Global Step: 153830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:19,230-Speed 9851.19 samples/sec Loss 4.9265 LearningRate 0.0105 Epoch: 27 Global Step: 153840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:20,135-Speed 11328.31 samples/sec Loss 4.9192 LearningRate 0.0105 Epoch: 27 Global Step: 153850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:21,040-Speed 11320.81 samples/sec Loss 4.8824 LearningRate 0.0105 Epoch: 27 Global Step: 153860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:21,938-Speed 11422.64 samples/sec Loss 4.8307 LearningRate 0.0105 Epoch: 27 Global Step: 153870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:22,832-Speed 11464.95 samples/sec Loss 4.8836 LearningRate 0.0105 Epoch: 27 Global Step: 153880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:23,763-Speed 11005.73 samples/sec Loss 4.9792 LearningRate 0.0105 Epoch: 27 Global Step: 153890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:24,653-Speed 11510.10 samples/sec Loss 5.0283 LearningRate 0.0105 Epoch: 27 Global Step: 153900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:25,657-Speed 10215.04 samples/sec Loss 4.9401 LearningRate 0.0105 Epoch: 27 Global Step: 153910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:26,616-Speed 10690.03 samples/sec Loss 4.9242 LearningRate 0.0104 Epoch: 27 Global Step: 153920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:41:27,624-Speed 10154.65 samples/sec Loss 4.9144 LearningRate 0.0104 Epoch: 27 Global Step: 153930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:28,518-Speed 11467.39 samples/sec Loss 4.9221 LearningRate 0.0104 Epoch: 27 Global Step: 153940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:29,520-Speed 10226.98 samples/sec Loss 4.7848 LearningRate 0.0104 Epoch: 27 Global Step: 153950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:30,453-Speed 10987.46 samples/sec Loss 4.8508 LearningRate 0.0104 Epoch: 27 Global Step: 153960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:31,501-Speed 9778.26 samples/sec Loss 4.8996 LearningRate 0.0104 Epoch: 27 Global Step: 153970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:32,429-Speed 11044.45 samples/sec Loss 4.7988 LearningRate 0.0104 Epoch: 27 Global Step: 153980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:33,346-Speed 11180.56 samples/sec Loss 4.8253 LearningRate 0.0104 Epoch: 27 Global Step: 153990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:34,235-Speed 11528.15 samples/sec Loss 4.9486 LearningRate 0.0104 Epoch: 27 Global Step: 154000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:41:55,435-[lfw][154000]XNorm: 8.755030 Training: 2022-04-11 04:41:55,435-[lfw][154000]Accuracy-Flip: 0.99633+-0.00332 Training: 2022-04-11 04:41:55,436-[lfw][154000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:42:19,955-[cfp_fp][154000]XNorm: 7.401275 Training: 2022-04-11 04:42:19,955-[cfp_fp][154000]Accuracy-Flip: 0.95429+-0.00963 Training: 2022-04-11 04:42:19,955-[cfp_fp][154000]Accuracy-Highest: 0.95429 Training: 2022-04-11 04:42:41,238-[agedb_30][154000]XNorm: 8.600263 Training: 2022-04-11 04:42:41,239-[agedb_30][154000]Accuracy-Flip: 0.96700+-0.00726 Training: 2022-04-11 04:42:41,239-[agedb_30][154000]Accuracy-Highest: 0.96883 Training: 2022-04-11 04:42:42,167-Speed 150.74 samples/sec Loss 4.8082 LearningRate 0.0104 Epoch: 27 Global Step: 154010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:43,053-Speed 11569.76 samples/sec Loss 4.8890 LearningRate 0.0104 Epoch: 27 Global Step: 154020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:43,948-Speed 11451.49 samples/sec Loss 5.0209 LearningRate 0.0104 Epoch: 27 Global Step: 154030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:42:44,829-Speed 11624.91 samples/sec Loss 4.8455 LearningRate 0.0104 Epoch: 27 Global Step: 154040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:45,705-Speed 11701.41 samples/sec Loss 4.9135 LearningRate 0.0104 Epoch: 27 Global Step: 154050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:46,594-Speed 11530.69 samples/sec Loss 4.8885 LearningRate 0.0104 Epoch: 27 Global Step: 154060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:47,476-Speed 11609.06 samples/sec Loss 4.7678 LearningRate 0.0104 Epoch: 27 Global Step: 154070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:48,369-Speed 11483.82 samples/sec Loss 4.9515 LearningRate 0.0104 Epoch: 27 Global Step: 154080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:49,275-Speed 11306.90 samples/sec Loss 4.9289 LearningRate 0.0104 Epoch: 27 Global Step: 154090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:50,206-Speed 11011.79 samples/sec Loss 5.0976 LearningRate 0.0104 Epoch: 27 Global Step: 154100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:51,126-Speed 11133.37 samples/sec Loss 4.8529 LearningRate 0.0104 Epoch: 27 Global Step: 154110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:52,030-Speed 11335.49 samples/sec Loss 4.8393 LearningRate 0.0104 Epoch: 27 Global Step: 154120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:53,000-Speed 10564.27 samples/sec Loss 4.9268 LearningRate 0.0104 Epoch: 27 Global Step: 154130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:53,939-Speed 10914.46 samples/sec Loss 4.8705 LearningRate 0.0104 Epoch: 27 Global Step: 154140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:42:54,892-Speed 10746.46 samples/sec Loss 4.9593 LearningRate 0.0104 Epoch: 27 Global Step: 154150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:55,790-Speed 11409.74 samples/sec Loss 4.8505 LearningRate 0.0104 Epoch: 27 Global Step: 154160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:56,719-Speed 11036.48 samples/sec Loss 4.9351 LearningRate 0.0104 Epoch: 27 Global Step: 154170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:57,637-Speed 11154.93 samples/sec Loss 4.8584 LearningRate 0.0104 Epoch: 27 Global Step: 154180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:58,525-Speed 11548.61 samples/sec Loss 4.9361 LearningRate 0.0104 Epoch: 27 Global Step: 154190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:42:59,444-Speed 11149.42 samples/sec Loss 4.8100 LearningRate 0.0104 Epoch: 27 Global Step: 154200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:00,358-Speed 11212.70 samples/sec Loss 5.0196 LearningRate 0.0104 Epoch: 27 Global Step: 154210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:01,246-Speed 11539.52 samples/sec Loss 4.9044 LearningRate 0.0104 Epoch: 27 Global Step: 154220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:02,134-Speed 11533.59 samples/sec Loss 4.9271 LearningRate 0.0104 Epoch: 27 Global Step: 154230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:03,043-Speed 11272.24 samples/sec Loss 4.9839 LearningRate 0.0104 Epoch: 27 Global Step: 154240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:03,957-Speed 11217.52 samples/sec Loss 5.0058 LearningRate 0.0104 Epoch: 27 Global Step: 154250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:43:04,866-Speed 11272.46 samples/sec Loss 5.1566 LearningRate 0.0104 Epoch: 27 Global Step: 154260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:05,808-Speed 10881.29 samples/sec Loss 4.9687 LearningRate 0.0103 Epoch: 27 Global Step: 154270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:06,697-Speed 11530.29 samples/sec Loss 4.9376 LearningRate 0.0103 Epoch: 27 Global Step: 154280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:07,624-Speed 11054.63 samples/sec Loss 5.0465 LearningRate 0.0103 Epoch: 27 Global Step: 154290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:08,515-Speed 11504.85 samples/sec Loss 5.0578 LearningRate 0.0103 Epoch: 27 Global Step: 154300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:09,406-Speed 11495.17 samples/sec Loss 4.9537 LearningRate 0.0103 Epoch: 27 Global Step: 154310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:10,291-Speed 11590.06 samples/sec Loss 4.9322 LearningRate 0.0103 Epoch: 27 Global Step: 154320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:11,193-Speed 11356.27 samples/sec Loss 5.0260 LearningRate 0.0103 Epoch: 27 Global Step: 154330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:12,092-Speed 11403.30 samples/sec Loss 4.9629 LearningRate 0.0103 Epoch: 27 Global Step: 154340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:12,999-Speed 11293.08 samples/sec Loss 5.0680 LearningRate 0.0103 Epoch: 27 Global Step: 154350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:13,929-Speed 11027.65 samples/sec Loss 4.9237 LearningRate 0.0103 Epoch: 27 Global Step: 154360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:14,869-Speed 10908.77 samples/sec Loss 5.0031 LearningRate 0.0103 Epoch: 27 Global Step: 154370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:15,835-Speed 10608.92 samples/sec Loss 4.9340 LearningRate 0.0103 Epoch: 27 Global Step: 154380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:16,787-Speed 10764.26 samples/sec Loss 4.9076 LearningRate 0.0103 Epoch: 27 Global Step: 154390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:17,714-Speed 11053.97 samples/sec Loss 4.8939 LearningRate 0.0103 Epoch: 27 Global Step: 154400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:18,642-Speed 11034.36 samples/sec Loss 5.0732 LearningRate 0.0103 Epoch: 27 Global Step: 154410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:19,551-Speed 11270.39 samples/sec Loss 5.0206 LearningRate 0.0103 Epoch: 27 Global Step: 154420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:20,464-Speed 11227.75 samples/sec Loss 5.1607 LearningRate 0.0103 Epoch: 27 Global Step: 154430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:21,361-Speed 11429.41 samples/sec Loss 5.0625 LearningRate 0.0103 Epoch: 27 Global Step: 154440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:22,275-Speed 11203.84 samples/sec Loss 4.9103 LearningRate 0.0103 Epoch: 27 Global Step: 154450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:23,204-Speed 11031.20 samples/sec Loss 5.0740 LearningRate 0.0103 Epoch: 27 Global Step: 154460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:24,116-Speed 11240.48 samples/sec Loss 5.0296 LearningRate 0.0103 Epoch: 27 Global Step: 154470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:25,041-Speed 11067.73 samples/sec Loss 5.0107 LearningRate 0.0103 Epoch: 27 Global Step: 154480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:25,958-Speed 11182.51 samples/sec Loss 4.9315 LearningRate 0.0103 Epoch: 27 Global Step: 154490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:26,883-Speed 11078.23 samples/sec Loss 4.9926 LearningRate 0.0103 Epoch: 27 Global Step: 154500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:27,759-Speed 11700.50 samples/sec Loss 5.0763 LearningRate 0.0103 Epoch: 27 Global Step: 154510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:28,690-Speed 11011.94 samples/sec Loss 4.9346 LearningRate 0.0103 Epoch: 27 Global Step: 154520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:29,625-Speed 10958.10 samples/sec Loss 4.9936 LearningRate 0.0103 Epoch: 27 Global Step: 154530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:30,517-Speed 11486.34 samples/sec Loss 5.2574 LearningRate 0.0103 Epoch: 27 Global Step: 154540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:31,437-Speed 11142.28 samples/sec Loss 4.9770 LearningRate 0.0103 Epoch: 27 Global Step: 154550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:32,334-Speed 11419.25 samples/sec Loss 5.1010 LearningRate 0.0103 Epoch: 27 Global Step: 154560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:33,227-Speed 11493.19 samples/sec Loss 5.0168 LearningRate 0.0103 Epoch: 27 Global Step: 154570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:34,117-Speed 11506.22 samples/sec Loss 5.0494 LearningRate 0.0103 Epoch: 27 Global Step: 154580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:35,021-Speed 11343.89 samples/sec Loss 5.2100 LearningRate 0.0103 Epoch: 27 Global Step: 154590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:35,929-Speed 11293.07 samples/sec Loss 4.9998 LearningRate 0.0103 Epoch: 27 Global Step: 154600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:36,830-Speed 11376.32 samples/sec Loss 5.0143 LearningRate 0.0103 Epoch: 27 Global Step: 154610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:37,750-Speed 11137.64 samples/sec Loss 4.9323 LearningRate 0.0103 Epoch: 27 Global Step: 154620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:38,674-Speed 11091.34 samples/sec Loss 4.9464 LearningRate 0.0102 Epoch: 27 Global Step: 154630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:39,600-Speed 11058.77 samples/sec Loss 5.0263 LearningRate 0.0102 Epoch: 27 Global Step: 154640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:40,498-Speed 11420.11 samples/sec Loss 5.0850 LearningRate 0.0102 Epoch: 27 Global Step: 154650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:41,417-Speed 11142.45 samples/sec Loss 4.9694 LearningRate 0.0102 Epoch: 27 Global Step: 154660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:42,328-Speed 11257.48 samples/sec Loss 4.8945 LearningRate 0.0102 Epoch: 27 Global Step: 154670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:43,239-Speed 11248.75 samples/sec Loss 4.9636 LearningRate 0.0102 Epoch: 27 Global Step: 154680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:44,159-Speed 11138.87 samples/sec Loss 4.9502 LearningRate 0.0102 Epoch: 27 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:43:45,050-Speed 11490.77 samples/sec Loss 5.0704 LearningRate 0.0102 Epoch: 27 Global Step: 154700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:45,939-Speed 11528.95 samples/sec Loss 5.1231 LearningRate 0.0102 Epoch: 27 Global Step: 154710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:46,822-Speed 11611.22 samples/sec Loss 5.0203 LearningRate 0.0102 Epoch: 27 Global Step: 154720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:47,708-Speed 11565.66 samples/sec Loss 5.0528 LearningRate 0.0102 Epoch: 27 Global Step: 154730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:48,654-Speed 10836.49 samples/sec Loss 5.0724 LearningRate 0.0102 Epoch: 27 Global Step: 154740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:49,579-Speed 11071.85 samples/sec Loss 5.0098 LearningRate 0.0102 Epoch: 27 Global Step: 154750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:50,510-Speed 11009.57 samples/sec Loss 4.9407 LearningRate 0.0102 Epoch: 27 Global Step: 154760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:51,428-Speed 11163.16 samples/sec Loss 5.0297 LearningRate 0.0102 Epoch: 27 Global Step: 154770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:52,374-Speed 10837.41 samples/sec Loss 5.0648 LearningRate 0.0102 Epoch: 27 Global Step: 154780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:53,310-Speed 10952.69 samples/sec Loss 5.0013 LearningRate 0.0102 Epoch: 27 Global Step: 154790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:54,215-Speed 11310.22 samples/sec Loss 5.0921 LearningRate 0.0102 Epoch: 27 Global Step: 154800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:43:55,199-Speed 10415.90 samples/sec Loss 5.0577 LearningRate 0.0102 Epoch: 27 Global Step: 154810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:43:56,153-Speed 10741.89 samples/sec Loss 5.0991 LearningRate 0.0102 Epoch: 27 Global Step: 154820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:43:57,025-Speed 11758.33 samples/sec Loss 5.0454 LearningRate 0.0102 Epoch: 27 Global Step: 154830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:43:57,903-Speed 11678.96 samples/sec Loss 5.0862 LearningRate 0.0102 Epoch: 27 Global Step: 154840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:58,780-Speed 11681.73 samples/sec Loss 5.0848 LearningRate 0.0102 Epoch: 27 Global Step: 154850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:43:59,671-Speed 11492.88 samples/sec Loss 5.0768 LearningRate 0.0102 Epoch: 27 Global Step: 154860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:00,572-Speed 11382.47 samples/sec Loss 5.0633 LearningRate 0.0102 Epoch: 27 Global Step: 154870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:01,469-Speed 11423.34 samples/sec Loss 5.1153 LearningRate 0.0102 Epoch: 27 Global Step: 154880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:02,395-Speed 11064.26 samples/sec Loss 5.1188 LearningRate 0.0102 Epoch: 27 Global Step: 154890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:03,302-Speed 11299.38 samples/sec Loss 5.0513 LearningRate 0.0102 Epoch: 27 Global Step: 154900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:04,192-Speed 11514.31 samples/sec Loss 5.0287 LearningRate 0.0102 Epoch: 27 Global Step: 154910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:05,113-Speed 11126.36 samples/sec Loss 5.0227 LearningRate 0.0102 Epoch: 27 Global Step: 154920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:06,030-Speed 11174.85 samples/sec Loss 5.0500 LearningRate 0.0102 Epoch: 27 Global Step: 154930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:06,919-Speed 11524.95 samples/sec Loss 5.1533 LearningRate 0.0102 Epoch: 27 Global Step: 154940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:07,825-Speed 11309.25 samples/sec Loss 5.0807 LearningRate 0.0102 Epoch: 27 Global Step: 154950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:08,739-Speed 11217.60 samples/sec Loss 5.1559 LearningRate 0.0102 Epoch: 27 Global Step: 154960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:09,624-Speed 11579.51 samples/sec Loss 5.1130 LearningRate 0.0102 Epoch: 27 Global Step: 154970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:10,510-Speed 11565.05 samples/sec Loss 5.1487 LearningRate 0.0101 Epoch: 27 Global Step: 154980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:11,407-Speed 11430.38 samples/sec Loss 5.0878 LearningRate 0.0101 Epoch: 27 Global Step: 154990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:12,332-Speed 11069.49 samples/sec Loss 5.0258 LearningRate 0.0101 Epoch: 27 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:13,244-Speed 11239.94 samples/sec Loss 5.1641 LearningRate 0.0101 Epoch: 27 Global Step: 155010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:14,158-Speed 11207.06 samples/sec Loss 5.1640 LearningRate 0.0101 Epoch: 27 Global Step: 155020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:15,100-Speed 10882.15 samples/sec Loss 5.1771 LearningRate 0.0101 Epoch: 27 Global Step: 155030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:15,993-Speed 11470.83 samples/sec Loss 5.0906 LearningRate 0.0101 Epoch: 27 Global Step: 155040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:16,903-Speed 11267.23 samples/sec Loss 5.1513 LearningRate 0.0101 Epoch: 27 Global Step: 155050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:17,796-Speed 11477.06 samples/sec Loss 5.0860 LearningRate 0.0101 Epoch: 27 Global Step: 155060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:18,718-Speed 11109.73 samples/sec Loss 5.0579 LearningRate 0.0101 Epoch: 27 Global Step: 155070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:19,694-Speed 10495.53 samples/sec Loss 5.0780 LearningRate 0.0101 Epoch: 27 Global Step: 155080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:20,591-Speed 11437.25 samples/sec Loss 5.1960 LearningRate 0.0101 Epoch: 27 Global Step: 155090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:21,484-Speed 11465.93 samples/sec Loss 5.2050 LearningRate 0.0101 Epoch: 27 Global Step: 155100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:22,375-Speed 11510.21 samples/sec Loss 5.0305 LearningRate 0.0101 Epoch: 27 Global Step: 155110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:23,260-Speed 11569.50 samples/sec Loss 5.0943 LearningRate 0.0101 Epoch: 27 Global Step: 155120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:24,202-Speed 10884.98 samples/sec Loss 4.9927 LearningRate 0.0101 Epoch: 27 Global Step: 155130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:25,092-Speed 11514.55 samples/sec Loss 5.0229 LearningRate 0.0101 Epoch: 27 Global Step: 155140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:26,045-Speed 10752.45 samples/sec Loss 5.0735 LearningRate 0.0101 Epoch: 27 Global Step: 155150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:26,957-Speed 11230.56 samples/sec Loss 5.0405 LearningRate 0.0101 Epoch: 27 Global Step: 155160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:27,873-Speed 11186.58 samples/sec Loss 5.1407 LearningRate 0.0101 Epoch: 27 Global Step: 155170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:28,777-Speed 11333.30 samples/sec Loss 4.9144 LearningRate 0.0101 Epoch: 27 Global Step: 155180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:29,669-Speed 11490.94 samples/sec Loss 5.1823 LearningRate 0.0101 Epoch: 27 Global Step: 155190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:30,551-Speed 11617.01 samples/sec Loss 5.1092 LearningRate 0.0101 Epoch: 27 Global Step: 155200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:31,470-Speed 11155.28 samples/sec Loss 5.1270 LearningRate 0.0101 Epoch: 27 Global Step: 155210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:32,395-Speed 11076.69 samples/sec Loss 5.0778 LearningRate 0.0101 Epoch: 27 Global Step: 155220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:44:33,292-Speed 11426.96 samples/sec Loss 5.1079 LearningRate 0.0101 Epoch: 27 Global Step: 155230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:34,196-Speed 11335.27 samples/sec Loss 5.1888 LearningRate 0.0101 Epoch: 27 Global Step: 155240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:35,112-Speed 11193.14 samples/sec Loss 5.0492 LearningRate 0.0101 Epoch: 27 Global Step: 155250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:36,028-Speed 11190.46 samples/sec Loss 5.1720 LearningRate 0.0101 Epoch: 27 Global Step: 155260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:36,937-Speed 11265.61 samples/sec Loss 5.1172 LearningRate 0.0101 Epoch: 27 Global Step: 155270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:37,842-Speed 11325.02 samples/sec Loss 5.1030 LearningRate 0.0101 Epoch: 27 Global Step: 155280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:38,749-Speed 11306.90 samples/sec Loss 5.2395 LearningRate 0.0101 Epoch: 27 Global Step: 155290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:39,646-Speed 11417.38 samples/sec Loss 5.1251 LearningRate 0.0101 Epoch: 27 Global Step: 155300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:40,598-Speed 10770.99 samples/sec Loss 5.0933 LearningRate 0.0101 Epoch: 27 Global Step: 155310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:41,504-Speed 11301.72 samples/sec Loss 5.0629 LearningRate 0.0101 Epoch: 27 Global Step: 155320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:42,418-Speed 11219.28 samples/sec Loss 5.2115 LearningRate 0.0101 Epoch: 27 Global Step: 155330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:43,311-Speed 11465.37 samples/sec Loss 5.1409 LearningRate 0.0100 Epoch: 27 Global Step: 155340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:44,255-Speed 10862.18 samples/sec Loss 5.0795 LearningRate 0.0100 Epoch: 27 Global Step: 155350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:45,131-Speed 11690.94 samples/sec Loss 5.1172 LearningRate 0.0100 Epoch: 27 Global Step: 155360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:46,029-Speed 11414.95 samples/sec Loss 5.1243 LearningRate 0.0100 Epoch: 27 Global Step: 155370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:46,918-Speed 11532.49 samples/sec Loss 5.1080 LearningRate 0.0100 Epoch: 27 Global Step: 155380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:47,835-Speed 11164.04 samples/sec Loss 5.0771 LearningRate 0.0100 Epoch: 27 Global Step: 155390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:48,735-Speed 11388.76 samples/sec Loss 5.1682 LearningRate 0.0100 Epoch: 27 Global Step: 155400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:49,680-Speed 10850.20 samples/sec Loss 5.1186 LearningRate 0.0100 Epoch: 27 Global Step: 155410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:50,584-Speed 11333.14 samples/sec Loss 5.0332 LearningRate 0.0100 Epoch: 27 Global Step: 155420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:51,498-Speed 11211.82 samples/sec Loss 5.1902 LearningRate 0.0100 Epoch: 27 Global Step: 155430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:52,438-Speed 10901.36 samples/sec Loss 5.1190 LearningRate 0.0100 Epoch: 27 Global Step: 155440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:53,352-Speed 11215.81 samples/sec Loss 5.2375 LearningRate 0.0100 Epoch: 27 Global Step: 155450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:54,250-Speed 11401.04 samples/sec Loss 5.1504 LearningRate 0.0100 Epoch: 27 Global Step: 155460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:55,137-Speed 11553.58 samples/sec Loss 4.9736 LearningRate 0.0100 Epoch: 27 Global Step: 155470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:56,054-Speed 11180.40 samples/sec Loss 5.2126 LearningRate 0.0100 Epoch: 27 Global Step: 155480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:57,035-Speed 10444.82 samples/sec Loss 5.1450 LearningRate 0.0100 Epoch: 27 Global Step: 155490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:57,934-Speed 11398.54 samples/sec Loss 5.1044 LearningRate 0.0100 Epoch: 27 Global Step: 155500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:58,872-Speed 10928.36 samples/sec Loss 5.0292 LearningRate 0.0100 Epoch: 27 Global Step: 155510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:44:59,823-Speed 10772.00 samples/sec Loss 5.1191 LearningRate 0.0100 Epoch: 27 Global Step: 155520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:00,732-Speed 11278.61 samples/sec Loss 5.1713 LearningRate 0.0100 Epoch: 27 Global Step: 155530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:01,663-Speed 11002.52 samples/sec Loss 5.1459 LearningRate 0.0100 Epoch: 27 Global Step: 155540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:02,553-Speed 11521.45 samples/sec Loss 5.1504 LearningRate 0.0100 Epoch: 27 Global Step: 155550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:03,498-Speed 10837.59 samples/sec Loss 5.2334 LearningRate 0.0100 Epoch: 27 Global Step: 155560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:04,394-Speed 11442.88 samples/sec Loss 5.0273 LearningRate 0.0100 Epoch: 27 Global Step: 155570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:05,284-Speed 11510.38 samples/sec Loss 5.0634 LearningRate 0.0100 Epoch: 27 Global Step: 155580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:06,202-Speed 11164.97 samples/sec Loss 5.0752 LearningRate 0.0100 Epoch: 27 Global Step: 155590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:07,133-Speed 11007.26 samples/sec Loss 5.1454 LearningRate 0.0100 Epoch: 27 Global Step: 155600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:08,068-Speed 10957.27 samples/sec Loss 5.0753 LearningRate 0.0100 Epoch: 27 Global Step: 155610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:08,963-Speed 11444.01 samples/sec Loss 5.2504 LearningRate 0.0100 Epoch: 27 Global Step: 155620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:09,862-Speed 11403.28 samples/sec Loss 5.1567 LearningRate 0.0100 Epoch: 27 Global Step: 155630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:45:10,750-Speed 11540.75 samples/sec Loss 5.1820 LearningRate 0.0100 Epoch: 27 Global Step: 155640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:11,663-Speed 11223.21 samples/sec Loss 5.0187 LearningRate 0.0100 Epoch: 27 Global Step: 155650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:12,540-Speed 11682.70 samples/sec Loss 5.1685 LearningRate 0.0100 Epoch: 27 Global Step: 155660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:13,450-Speed 11259.78 samples/sec Loss 5.1869 LearningRate 0.0100 Epoch: 27 Global Step: 155670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:14,350-Speed 11392.54 samples/sec Loss 5.2450 LearningRate 0.0100 Epoch: 27 Global Step: 155680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:15,274-Speed 11089.11 samples/sec Loss 5.2166 LearningRate 0.0100 Epoch: 27 Global Step: 155690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:16,182-Speed 11282.58 samples/sec Loss 5.2121 LearningRate 0.0099 Epoch: 27 Global Step: 155700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:17,091-Speed 11269.94 samples/sec Loss 5.1901 LearningRate 0.0099 Epoch: 27 Global Step: 155710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:18,010-Speed 11152.35 samples/sec Loss 5.2465 LearningRate 0.0099 Epoch: 27 Global Step: 155720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:18,918-Speed 11293.57 samples/sec Loss 5.2187 LearningRate 0.0099 Epoch: 27 Global Step: 155730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:19,818-Speed 11376.75 samples/sec Loss 5.1459 LearningRate 0.0099 Epoch: 27 Global Step: 155740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:20,725-Speed 11306.34 samples/sec Loss 5.1897 LearningRate 0.0099 Epoch: 27 Global Step: 155750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:21,602-Speed 11682.73 samples/sec Loss 5.1300 LearningRate 0.0099 Epoch: 27 Global Step: 155760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:22,486-Speed 11603.38 samples/sec Loss 5.1108 LearningRate 0.0099 Epoch: 27 Global Step: 155770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:23,386-Speed 11379.52 samples/sec Loss 5.0779 LearningRate 0.0099 Epoch: 27 Global Step: 155780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:24,297-Speed 11248.23 samples/sec Loss 5.1237 LearningRate 0.0099 Epoch: 27 Global Step: 155790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:25,220-Speed 11099.01 samples/sec Loss 5.1748 LearningRate 0.0099 Epoch: 27 Global Step: 155800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:26,120-Speed 11399.10 samples/sec Loss 5.2441 LearningRate 0.0099 Epoch: 27 Global Step: 155810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:27,017-Speed 11414.80 samples/sec Loss 5.1751 LearningRate 0.0099 Epoch: 27 Global Step: 155820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:27,944-Speed 11060.23 samples/sec Loss 5.0961 LearningRate 0.0099 Epoch: 27 Global Step: 155830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:28,869-Speed 11069.80 samples/sec Loss 5.1576 LearningRate 0.0099 Epoch: 27 Global Step: 155840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:29,854-Speed 10405.94 samples/sec Loss 5.2803 LearningRate 0.0099 Epoch: 27 Global Step: 155850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:30,758-Speed 11336.19 samples/sec Loss 5.1974 LearningRate 0.0099 Epoch: 27 Global Step: 155860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:31,644-Speed 11576.02 samples/sec Loss 5.2357 LearningRate 0.0099 Epoch: 27 Global Step: 155870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:32,536-Speed 11484.13 samples/sec Loss 5.2006 LearningRate 0.0099 Epoch: 27 Global Step: 155880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:33,427-Speed 11504.15 samples/sec Loss 5.1779 LearningRate 0.0099 Epoch: 27 Global Step: 155890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:34,310-Speed 11608.01 samples/sec Loss 5.2347 LearningRate 0.0099 Epoch: 27 Global Step: 155900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:35,249-Speed 10908.90 samples/sec Loss 5.1665 LearningRate 0.0099 Epoch: 27 Global Step: 155910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:36,162-Speed 11221.17 samples/sec Loss 5.0447 LearningRate 0.0099 Epoch: 27 Global Step: 155920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:37,059-Speed 11420.47 samples/sec Loss 5.1579 LearningRate 0.0099 Epoch: 27 Global Step: 155930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:38,009-Speed 10798.49 samples/sec Loss 5.1955 LearningRate 0.0099 Epoch: 27 Global Step: 155940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:38,955-Speed 10831.08 samples/sec Loss 5.0745 LearningRate 0.0099 Epoch: 27 Global Step: 155950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:39,898-Speed 10858.95 samples/sec Loss 5.2877 LearningRate 0.0099 Epoch: 27 Global Step: 155960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:45:40,845-Speed 10832.48 samples/sec Loss 5.2140 LearningRate 0.0099 Epoch: 27 Global Step: 155970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:45:41,776-Speed 10999.69 samples/sec Loss 5.1027 LearningRate 0.0099 Epoch: 27 Global Step: 155980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:45:42,717-Speed 10887.38 samples/sec Loss 5.2996 LearningRate 0.0099 Epoch: 27 Global Step: 155990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:45:43,620-Speed 11346.86 samples/sec Loss 5.2629 LearningRate 0.0099 Epoch: 27 Global Step: 156000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:04,860-[lfw][156000]XNorm: 8.897849 Training: 2022-04-11 04:46:04,861-[lfw][156000]Accuracy-Flip: 0.99700+-0.00296 Training: 2022-04-11 04:46:04,861-[lfw][156000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:46:29,615-[cfp_fp][156000]XNorm: 7.553217 Training: 2022-04-11 04:46:29,615-[cfp_fp][156000]Accuracy-Flip: 0.95086+-0.01044 Training: 2022-04-11 04:46:29,615-[cfp_fp][156000]Accuracy-Highest: 0.95429 Training: 2022-04-11 04:46:51,122-[agedb_30][156000]XNorm: 8.751674 Training: 2022-04-11 04:46:51,122-[agedb_30][156000]Accuracy-Flip: 0.96900+-0.00779 Training: 2022-04-11 04:46:51,123-[agedb_30][156000]Accuracy-Highest: 0.96900 Training: 2022-04-11 04:46:52,026-Speed 149.70 samples/sec Loss 5.0648 LearningRate 0.0099 Epoch: 27 Global Step: 156010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:52,918-Speed 11487.65 samples/sec Loss 5.2117 LearningRate 0.0099 Epoch: 27 Global Step: 156020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:53,814-Speed 11441.56 samples/sec Loss 5.2474 LearningRate 0.0099 Epoch: 27 Global Step: 156030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:54,743-Speed 11026.44 samples/sec Loss 5.1249 LearningRate 0.0099 Epoch: 27 Global Step: 156040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:55,700-Speed 10707.78 samples/sec Loss 5.1299 LearningRate 0.0099 Epoch: 27 Global Step: 156050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:56,619-Speed 11153.29 samples/sec Loss 5.2147 LearningRate 0.0098 Epoch: 27 Global Step: 156060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:57,531-Speed 11227.35 samples/sec Loss 4.9909 LearningRate 0.0098 Epoch: 27 Global Step: 156070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:58,438-Speed 11296.42 samples/sec Loss 5.2772 LearningRate 0.0098 Epoch: 27 Global Step: 156080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:46:59,351-Speed 11224.91 samples/sec Loss 5.1420 LearningRate 0.0098 Epoch: 27 Global Step: 156090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:00,246-Speed 11460.01 samples/sec Loss 5.1726 LearningRate 0.0098 Epoch: 27 Global Step: 156100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:01,157-Speed 11241.37 samples/sec Loss 5.0872 LearningRate 0.0098 Epoch: 27 Global Step: 156110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:02,057-Speed 11390.50 samples/sec Loss 5.1686 LearningRate 0.0098 Epoch: 27 Global Step: 156120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:02,959-Speed 11362.68 samples/sec Loss 5.2177 LearningRate 0.0098 Epoch: 27 Global Step: 156130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:03,856-Speed 11418.08 samples/sec Loss 5.2985 LearningRate 0.0098 Epoch: 27 Global Step: 156140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:04,763-Speed 11303.11 samples/sec Loss 5.1256 LearningRate 0.0098 Epoch: 27 Global Step: 156150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:05,694-Speed 11009.96 samples/sec Loss 5.3111 LearningRate 0.0098 Epoch: 27 Global Step: 156160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:06,596-Speed 11355.88 samples/sec Loss 5.0687 LearningRate 0.0098 Epoch: 27 Global Step: 156170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:07,519-Speed 11105.25 samples/sec Loss 5.2443 LearningRate 0.0098 Epoch: 27 Global Step: 156180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:08,419-Speed 11380.94 samples/sec Loss 5.2585 LearningRate 0.0098 Epoch: 27 Global Step: 156190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:09,377-Speed 10699.25 samples/sec Loss 5.2498 LearningRate 0.0098 Epoch: 27 Global Step: 156200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:10,311-Speed 10970.04 samples/sec Loss 5.2447 LearningRate 0.0098 Epoch: 27 Global Step: 156210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:11,210-Speed 11400.88 samples/sec Loss 5.1899 LearningRate 0.0098 Epoch: 27 Global Step: 156220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:12,111-Speed 11375.64 samples/sec Loss 5.1764 LearningRate 0.0098 Epoch: 27 Global Step: 156230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:13,021-Speed 11260.33 samples/sec Loss 5.0516 LearningRate 0.0098 Epoch: 27 Global Step: 156240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:13,921-Speed 11392.78 samples/sec Loss 5.1439 LearningRate 0.0098 Epoch: 27 Global Step: 156250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:14,814-Speed 11466.09 samples/sec Loss 5.3502 LearningRate 0.0098 Epoch: 27 Global Step: 156260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:15,765-Speed 10785.51 samples/sec Loss 5.2832 LearningRate 0.0098 Epoch: 27 Global Step: 156270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:16,711-Speed 10824.26 samples/sec Loss 5.2646 LearningRate 0.0098 Epoch: 27 Global Step: 156280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:17,609-Speed 11412.44 samples/sec Loss 5.2906 LearningRate 0.0098 Epoch: 27 Global Step: 156290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:18,542-Speed 10987.47 samples/sec Loss 5.3187 LearningRate 0.0098 Epoch: 27 Global Step: 156300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:19,503-Speed 10656.56 samples/sec Loss 5.1432 LearningRate 0.0098 Epoch: 27 Global Step: 156310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:20,410-Speed 11307.73 samples/sec Loss 5.1312 LearningRate 0.0098 Epoch: 27 Global Step: 156320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:21,282-Speed 11749.02 samples/sec Loss 5.2501 LearningRate 0.0098 Epoch: 27 Global Step: 156330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:22,206-Speed 11089.78 samples/sec Loss 5.1375 LearningRate 0.0098 Epoch: 27 Global Step: 156340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:23,099-Speed 11475.13 samples/sec Loss 5.1997 LearningRate 0.0098 Epoch: 27 Global Step: 156350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:24,004-Speed 11320.24 samples/sec Loss 5.3356 LearningRate 0.0098 Epoch: 27 Global Step: 156360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:24,901-Speed 11427.72 samples/sec Loss 5.1233 LearningRate 0.0098 Epoch: 27 Global Step: 156370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:25,800-Speed 11392.41 samples/sec Loss 5.2714 LearningRate 0.0098 Epoch: 27 Global Step: 156380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:26,713-Speed 11229.45 samples/sec Loss 5.1241 LearningRate 0.0098 Epoch: 27 Global Step: 156390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:27,605-Speed 11490.69 samples/sec Loss 5.3814 LearningRate 0.0098 Epoch: 27 Global Step: 156400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:28,515-Speed 11257.24 samples/sec Loss 5.1539 LearningRate 0.0098 Epoch: 27 Global Step: 156410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:29,465-Speed 10784.65 samples/sec Loss 5.1873 LearningRate 0.0098 Epoch: 27 Global Step: 156420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:30,401-Speed 10944.97 samples/sec Loss 5.2842 LearningRate 0.0097 Epoch: 27 Global Step: 156430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:31,301-Speed 11393.25 samples/sec Loss 5.1897 LearningRate 0.0097 Epoch: 27 Global Step: 156440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:32,227-Speed 11058.83 samples/sec Loss 5.1166 LearningRate 0.0097 Epoch: 27 Global Step: 156450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:33,134-Speed 11300.19 samples/sec Loss 5.3052 LearningRate 0.0097 Epoch: 27 Global Step: 156460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:34,051-Speed 11172.06 samples/sec Loss 5.2166 LearningRate 0.0097 Epoch: 27 Global Step: 156470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:34,947-Speed 11447.03 samples/sec Loss 5.1122 LearningRate 0.0097 Epoch: 27 Global Step: 156480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:35,862-Speed 11201.85 samples/sec Loss 5.1687 LearningRate 0.0097 Epoch: 27 Global Step: 156490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:36,742-Speed 11640.07 samples/sec Loss 5.2585 LearningRate 0.0097 Epoch: 27 Global Step: 156500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:37,623-Speed 11637.56 samples/sec Loss 5.3021 LearningRate 0.0097 Epoch: 27 Global Step: 156510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:38,537-Speed 11204.36 samples/sec Loss 5.1877 LearningRate 0.0097 Epoch: 27 Global Step: 156520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:39,468-Speed 11003.75 samples/sec Loss 5.2198 LearningRate 0.0097 Epoch: 27 Global Step: 156530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:40,397-Speed 11042.54 samples/sec Loss 5.1181 LearningRate 0.0097 Epoch: 27 Global Step: 156540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:41,326-Speed 11026.31 samples/sec Loss 5.0682 LearningRate 0.0097 Epoch: 27 Global Step: 156550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:42,221-Speed 11454.54 samples/sec Loss 5.1471 LearningRate 0.0097 Epoch: 27 Global Step: 156560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:43,153-Speed 10993.97 samples/sec Loss 5.1871 LearningRate 0.0097 Epoch: 27 Global Step: 156570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:44,069-Speed 11176.55 samples/sec Loss 5.1863 LearningRate 0.0097 Epoch: 27 Global Step: 156580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:44,992-Speed 11110.83 samples/sec Loss 5.1284 LearningRate 0.0097 Epoch: 27 Global Step: 156590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:47:45,875-Speed 11605.01 samples/sec Loss 5.1724 LearningRate 0.0097 Epoch: 27 Global Step: 156600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:46,807-Speed 10994.78 samples/sec Loss 5.3026 LearningRate 0.0097 Epoch: 27 Global Step: 156610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:47,711-Speed 11332.58 samples/sec Loss 5.1978 LearningRate 0.0097 Epoch: 27 Global Step: 156620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:48,642-Speed 11008.11 samples/sec Loss 5.1729 LearningRate 0.0097 Epoch: 27 Global Step: 156630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:49,542-Speed 11385.75 samples/sec Loss 5.2011 LearningRate 0.0097 Epoch: 27 Global Step: 156640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:50,436-Speed 11466.63 samples/sec Loss 5.2728 LearningRate 0.0097 Epoch: 27 Global Step: 156650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:51,341-Speed 11320.56 samples/sec Loss 5.2144 LearningRate 0.0097 Epoch: 27 Global Step: 156660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:52,278-Speed 10932.38 samples/sec Loss 5.1236 LearningRate 0.0097 Epoch: 27 Global Step: 156670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:53,166-Speed 11547.87 samples/sec Loss 5.1954 LearningRate 0.0097 Epoch: 27 Global Step: 156680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:54,099-Speed 10972.40 samples/sec Loss 5.3266 LearningRate 0.0097 Epoch: 27 Global Step: 156690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:55,046-Speed 10832.70 samples/sec Loss 5.2139 LearningRate 0.0097 Epoch: 27 Global Step: 156700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 04:47:55,959-Speed 11229.48 samples/sec Loss 5.2072 LearningRate 0.0097 Epoch: 27 Global Step: 156710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:56,872-Speed 11218.79 samples/sec Loss 5.3292 LearningRate 0.0097 Epoch: 27 Global Step: 156720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:57,802-Speed 11026.15 samples/sec Loss 5.2128 LearningRate 0.0097 Epoch: 27 Global Step: 156730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 04:47:58,730-Speed 11031.60 samples/sec Loss 5.2918 LearningRate 0.0097 Epoch: 27 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:47:59,633-Speed 11354.31 samples/sec Loss 5.2699 LearningRate 0.0097 Epoch: 27 Global Step: 156750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:48:00,533-Speed 11389.56 samples/sec Loss 5.1522 LearningRate 0.0097 Epoch: 27 Global Step: 156760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 04:48:01,453-Speed 11132.34 samples/sec Loss 5.3354 LearningRate 0.0097 Epoch: 27 Global Step: 156770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:02,354-Speed 11375.10 samples/sec Loss 5.2484 LearningRate 0.0097 Epoch: 27 Global Step: 156780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:03,266-Speed 11230.88 samples/sec Loss 5.1124 LearningRate 0.0096 Epoch: 27 Global Step: 156790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:04,215-Speed 10800.66 samples/sec Loss 5.1330 LearningRate 0.0096 Epoch: 27 Global Step: 156800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:05,135-Speed 11135.77 samples/sec Loss 5.2637 LearningRate 0.0096 Epoch: 27 Global Step: 156810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:06,075-Speed 10901.77 samples/sec Loss 5.2724 LearningRate 0.0096 Epoch: 27 Global Step: 156820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:06,999-Speed 11096.47 samples/sec Loss 5.2064 LearningRate 0.0096 Epoch: 27 Global Step: 156830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:07,872-Speed 11729.07 samples/sec Loss 5.3126 LearningRate 0.0096 Epoch: 27 Global Step: 156840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:08,821-Speed 10801.88 samples/sec Loss 5.2595 LearningRate 0.0096 Epoch: 27 Global Step: 156850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:09,757-Speed 10943.76 samples/sec Loss 5.1702 LearningRate 0.0096 Epoch: 27 Global Step: 156860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:10,689-Speed 10996.36 samples/sec Loss 5.2264 LearningRate 0.0096 Epoch: 27 Global Step: 156870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:11,640-Speed 10775.69 samples/sec Loss 5.2326 LearningRate 0.0096 Epoch: 27 Global Step: 156880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:12,580-Speed 10907.39 samples/sec Loss 5.2799 LearningRate 0.0096 Epoch: 27 Global Step: 156890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:13,525-Speed 10839.97 samples/sec Loss 5.2965 LearningRate 0.0096 Epoch: 27 Global Step: 156900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:14,485-Speed 10678.31 samples/sec Loss 5.1295 LearningRate 0.0096 Epoch: 27 Global Step: 156910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:15,419-Speed 10987.43 samples/sec Loss 5.2713 LearningRate 0.0096 Epoch: 27 Global Step: 156920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:16,364-Speed 10840.53 samples/sec Loss 5.1855 LearningRate 0.0096 Epoch: 27 Global Step: 156930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:48:17,289-Speed 11077.91 samples/sec Loss 5.3886 LearningRate 0.0096 Epoch: 27 Global Step: 156940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:18,246-Speed 10701.06 samples/sec Loss 5.2379 LearningRate 0.0096 Epoch: 27 Global Step: 156950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:19,168-Speed 11113.09 samples/sec Loss 5.0992 LearningRate 0.0096 Epoch: 27 Global Step: 156960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:20,081-Speed 11231.42 samples/sec Loss 5.2659 LearningRate 0.0096 Epoch: 27 Global Step: 156970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:20,988-Speed 11298.25 samples/sec Loss 5.2616 LearningRate 0.0096 Epoch: 27 Global Step: 156980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:21,905-Speed 11167.37 samples/sec Loss 5.3661 LearningRate 0.0096 Epoch: 27 Global Step: 156990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:22,819-Speed 11218.01 samples/sec Loss 5.2505 LearningRate 0.0096 Epoch: 27 Global Step: 157000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:23,735-Speed 11190.88 samples/sec Loss 5.2669 LearningRate 0.0096 Epoch: 27 Global Step: 157010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:24,623-Speed 11536.95 samples/sec Loss 5.2420 LearningRate 0.0096 Epoch: 27 Global Step: 157020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:25,557-Speed 10974.65 samples/sec Loss 5.2998 LearningRate 0.0096 Epoch: 27 Global Step: 157030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:26,475-Speed 11160.61 samples/sec Loss 5.0982 LearningRate 0.0096 Epoch: 27 Global Step: 157040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:48:27,420-Speed 10840.20 samples/sec Loss 5.2782 LearningRate 0.0096 Epoch: 27 Global Step: 157050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:48:28,309-Speed 11532.35 samples/sec Loss 5.3831 LearningRate 0.0096 Epoch: 27 Global Step: 157060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:29,236-Speed 11059.68 samples/sec Loss 5.2131 LearningRate 0.0096 Epoch: 27 Global Step: 157070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:30,123-Speed 11548.01 samples/sec Loss 5.2657 LearningRate 0.0096 Epoch: 27 Global Step: 157080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:31,019-Speed 11431.41 samples/sec Loss 5.2141 LearningRate 0.0096 Epoch: 27 Global Step: 157090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:31,948-Speed 11033.79 samples/sec Loss 5.1777 LearningRate 0.0096 Epoch: 27 Global Step: 157100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:32,830-Speed 11620.23 samples/sec Loss 5.2552 LearningRate 0.0096 Epoch: 27 Global Step: 157110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:33,754-Speed 11087.49 samples/sec Loss 5.2129 LearningRate 0.0096 Epoch: 27 Global Step: 157120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:34,661-Speed 11298.25 samples/sec Loss 5.2800 LearningRate 0.0096 Epoch: 27 Global Step: 157130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:35,558-Speed 11429.34 samples/sec Loss 5.2268 LearningRate 0.0096 Epoch: 27 Global Step: 157140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:36,447-Speed 11524.54 samples/sec Loss 5.1863 LearningRate 0.0096 Epoch: 27 Global Step: 157150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:37,354-Speed 11296.15 samples/sec Loss 5.2753 LearningRate 0.0095 Epoch: 27 Global Step: 157160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:48:38,240-Speed 11566.01 samples/sec Loss 5.1357 LearningRate 0.0095 Epoch: 27 Global Step: 157170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:39,182-Speed 10884.92 samples/sec Loss 5.1982 LearningRate 0.0095 Epoch: 27 Global Step: 157180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:40,097-Speed 11189.98 samples/sec Loss 5.3056 LearningRate 0.0095 Epoch: 27 Global Step: 157190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:40,992-Speed 11462.01 samples/sec Loss 5.3589 LearningRate 0.0095 Epoch: 27 Global Step: 157200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:41,906-Speed 11199.36 samples/sec Loss 5.2697 LearningRate 0.0095 Epoch: 27 Global Step: 157210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:42,832-Speed 11073.31 samples/sec Loss 5.5011 LearningRate 0.0095 Epoch: 27 Global Step: 157220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:43,712-Speed 11653.36 samples/sec Loss 5.2940 LearningRate 0.0095 Epoch: 27 Global Step: 157230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:44,605-Speed 11465.25 samples/sec Loss 5.3652 LearningRate 0.0095 Epoch: 27 Global Step: 157240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:45,500-Speed 11451.45 samples/sec Loss 5.2296 LearningRate 0.0095 Epoch: 27 Global Step: 157250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:46,402-Speed 11362.35 samples/sec Loss 5.1707 LearningRate 0.0095 Epoch: 27 Global Step: 157260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:47,310-Speed 11293.29 samples/sec Loss 5.2791 LearningRate 0.0095 Epoch: 27 Global Step: 157270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:48,220-Speed 11260.15 samples/sec Loss 5.1456 LearningRate 0.0095 Epoch: 27 Global Step: 157280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:49,159-Speed 10913.97 samples/sec Loss 5.2053 LearningRate 0.0095 Epoch: 27 Global Step: 157290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:50,076-Speed 11164.84 samples/sec Loss 5.2431 LearningRate 0.0095 Epoch: 27 Global Step: 157300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:51,029-Speed 10757.49 samples/sec Loss 5.0842 LearningRate 0.0095 Epoch: 27 Global Step: 157310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:51,943-Speed 11206.52 samples/sec Loss 5.3349 LearningRate 0.0095 Epoch: 27 Global Step: 157320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:52,930-Speed 10385.84 samples/sec Loss 5.2976 LearningRate 0.0095 Epoch: 27 Global Step: 157330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:53,838-Speed 11286.71 samples/sec Loss 5.3189 LearningRate 0.0095 Epoch: 27 Global Step: 157340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:54,719-Speed 11629.28 samples/sec Loss 5.2328 LearningRate 0.0095 Epoch: 27 Global Step: 157350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:55,659-Speed 10909.56 samples/sec Loss 5.2987 LearningRate 0.0095 Epoch: 27 Global Step: 157360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:56,631-Speed 10535.40 samples/sec Loss 5.2783 LearningRate 0.0095 Epoch: 27 Global Step: 157370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:48:57,546-Speed 11205.28 samples/sec Loss 5.2739 LearningRate 0.0095 Epoch: 27 Global Step: 157380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:58,494-Speed 10807.30 samples/sec Loss 5.2054 LearningRate 0.0095 Epoch: 27 Global Step: 157390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:48:59,439-Speed 10839.37 samples/sec Loss 5.3051 LearningRate 0.0095 Epoch: 27 Global Step: 157400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:00,331-Speed 11489.48 samples/sec Loss 5.2742 LearningRate 0.0095 Epoch: 27 Global Step: 157410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:01,252-Speed 11131.84 samples/sec Loss 5.2246 LearningRate 0.0095 Epoch: 27 Global Step: 157420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:02,163-Speed 11249.63 samples/sec Loss 5.2757 LearningRate 0.0095 Epoch: 27 Global Step: 157430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:03,067-Speed 11335.03 samples/sec Loss 5.2255 LearningRate 0.0095 Epoch: 27 Global Step: 157440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:03,970-Speed 11342.42 samples/sec Loss 5.3271 LearningRate 0.0095 Epoch: 27 Global Step: 157450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:04,881-Speed 11250.80 samples/sec Loss 5.2096 LearningRate 0.0095 Epoch: 27 Global Step: 157460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:05,807-Speed 11069.72 samples/sec Loss 5.2874 LearningRate 0.0095 Epoch: 27 Global Step: 157470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:06,713-Speed 11309.52 samples/sec Loss 5.0705 LearningRate 0.0095 Epoch: 27 Global Step: 157480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:07,667-Speed 10735.32 samples/sec Loss 5.1922 LearningRate 0.0095 Epoch: 27 Global Step: 157490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:08,594-Speed 11057.85 samples/sec Loss 5.3580 LearningRate 0.0095 Epoch: 27 Global Step: 157500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:09,493-Speed 11395.92 samples/sec Loss 5.1991 LearningRate 0.0095 Epoch: 27 Global Step: 157510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:10,422-Speed 11031.00 samples/sec Loss 5.3673 LearningRate 0.0095 Epoch: 27 Global Step: 157520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:11,335-Speed 11233.31 samples/sec Loss 5.4282 LearningRate 0.0094 Epoch: 27 Global Step: 157530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:12,241-Speed 11303.78 samples/sec Loss 5.2302 LearningRate 0.0094 Epoch: 27 Global Step: 157540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:13,156-Speed 11198.63 samples/sec Loss 5.1590 LearningRate 0.0094 Epoch: 27 Global Step: 157550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:14,107-Speed 10775.94 samples/sec Loss 5.2242 LearningRate 0.0094 Epoch: 27 Global Step: 157560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:15,035-Speed 11044.91 samples/sec Loss 5.1332 LearningRate 0.0094 Epoch: 27 Global Step: 157570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:15,939-Speed 11327.79 samples/sec Loss 5.2466 LearningRate 0.0094 Epoch: 27 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:16,886-Speed 10824.93 samples/sec Loss 5.3872 LearningRate 0.0094 Epoch: 27 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:17,770-Speed 11589.10 samples/sec Loss 5.2396 LearningRate 0.0094 Epoch: 27 Global Step: 157600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:18,688-Speed 11159.78 samples/sec Loss 5.1663 LearningRate 0.0094 Epoch: 27 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:19,647-Speed 10681.93 samples/sec Loss 5.3895 LearningRate 0.0094 Epoch: 27 Global Step: 157620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:49:20,535-Speed 11543.20 samples/sec Loss 5.3180 LearningRate 0.0094 Epoch: 27 Global Step: 157630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:49:21,452-Speed 11181.65 samples/sec Loss 5.2906 LearningRate 0.0094 Epoch: 27 Global Step: 157640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:49:22,335-Speed 11602.00 samples/sec Loss 5.2401 LearningRate 0.0094 Epoch: 27 Global Step: 157650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:49:23,222-Speed 11547.45 samples/sec Loss 5.2588 LearningRate 0.0094 Epoch: 27 Global Step: 157660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:24,170-Speed 10806.79 samples/sec Loss 5.2683 LearningRate 0.0094 Epoch: 27 Global Step: 157670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:25,075-Speed 11330.70 samples/sec Loss 5.3003 LearningRate 0.0094 Epoch: 27 Global Step: 157680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:25,982-Speed 11325.38 samples/sec Loss 5.3727 LearningRate 0.0094 Epoch: 27 Global Step: 157690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:26,863-Speed 11630.37 samples/sec Loss 5.2201 LearningRate 0.0094 Epoch: 27 Global Step: 157700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:27,744-Speed 11635.62 samples/sec Loss 5.2067 LearningRate 0.0094 Epoch: 27 Global Step: 157710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:28,685-Speed 10879.68 samples/sec Loss 5.2988 LearningRate 0.0094 Epoch: 27 Global Step: 157720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:29,616-Speed 11015.80 samples/sec Loss 5.3826 LearningRate 0.0094 Epoch: 27 Global Step: 157730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:30,518-Speed 11354.98 samples/sec Loss 5.2287 LearningRate 0.0094 Epoch: 27 Global Step: 157740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:31,437-Speed 11152.26 samples/sec Loss 5.3618 LearningRate 0.0094 Epoch: 27 Global Step: 157750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:32,356-Speed 11154.73 samples/sec Loss 5.3261 LearningRate 0.0094 Epoch: 27 Global Step: 157760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:33,278-Speed 11112.51 samples/sec Loss 5.1887 LearningRate 0.0094 Epoch: 27 Global Step: 157770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:34,200-Speed 11109.24 samples/sec Loss 5.2228 LearningRate 0.0094 Epoch: 27 Global Step: 157780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:35,090-Speed 11519.40 samples/sec Loss 5.2319 LearningRate 0.0094 Epoch: 27 Global Step: 157790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:36,043-Speed 10755.11 samples/sec Loss 5.1907 LearningRate 0.0094 Epoch: 27 Global Step: 157800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:36,968-Speed 11077.88 samples/sec Loss 5.1764 LearningRate 0.0094 Epoch: 27 Global Step: 157810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:37,861-Speed 11465.28 samples/sec Loss 5.1543 LearningRate 0.0094 Epoch: 27 Global Step: 157820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:38,786-Speed 11078.90 samples/sec Loss 5.2619 LearningRate 0.0094 Epoch: 27 Global Step: 157830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:39,761-Speed 10511.97 samples/sec Loss 5.1884 LearningRate 0.0094 Epoch: 27 Global Step: 157840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:40,717-Speed 10718.72 samples/sec Loss 5.3478 LearningRate 0.0094 Epoch: 27 Global Step: 157850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:41,657-Speed 10901.16 samples/sec Loss 5.1142 LearningRate 0.0094 Epoch: 27 Global Step: 157860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:42,579-Speed 11109.65 samples/sec Loss 5.3870 LearningRate 0.0094 Epoch: 27 Global Step: 157870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:43,507-Speed 11044.57 samples/sec Loss 5.2650 LearningRate 0.0094 Epoch: 27 Global Step: 157880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:44,390-Speed 11615.50 samples/sec Loss 5.4421 LearningRate 0.0094 Epoch: 27 Global Step: 157890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:45,309-Speed 11157.69 samples/sec Loss 5.2962 LearningRate 0.0093 Epoch: 27 Global Step: 157900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:46,262-Speed 10746.81 samples/sec Loss 5.2853 LearningRate 0.0093 Epoch: 27 Global Step: 157910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:47,201-Speed 10917.22 samples/sec Loss 5.1832 LearningRate 0.0093 Epoch: 27 Global Step: 157920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:48,097-Speed 11431.99 samples/sec Loss 5.3850 LearningRate 0.0093 Epoch: 27 Global Step: 157930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:49:48,978-Speed 11638.46 samples/sec Loss 5.2305 LearningRate 0.0093 Epoch: 27 Global Step: 157940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:49,874-Speed 11433.34 samples/sec Loss 5.2156 LearningRate 0.0093 Epoch: 27 Global Step: 157950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:50,797-Speed 11110.13 samples/sec Loss 5.2664 LearningRate 0.0093 Epoch: 27 Global Step: 157960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:51,744-Speed 10820.56 samples/sec Loss 5.4162 LearningRate 0.0093 Epoch: 27 Global Step: 157970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:52,666-Speed 11116.31 samples/sec Loss 5.2752 LearningRate 0.0093 Epoch: 27 Global Step: 157980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:53,592-Speed 11060.78 samples/sec Loss 5.4261 LearningRate 0.0093 Epoch: 27 Global Step: 157990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:49:54,525-Speed 10979.81 samples/sec Loss 5.3032 LearningRate 0.0093 Epoch: 27 Global Step: 158000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:50:15,709-[lfw][158000]XNorm: 8.675658 Training: 2022-04-11 04:50:15,710-[lfw][158000]Accuracy-Flip: 0.99650+-0.00337 Training: 2022-04-11 04:50:15,710-[lfw][158000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:50:40,250-[cfp_fp][158000]XNorm: 7.374592 Training: 2022-04-11 04:50:40,250-[cfp_fp][158000]Accuracy-Flip: 0.94986+-0.00916 Training: 2022-04-11 04:50:40,251-[cfp_fp][158000]Accuracy-Highest: 0.95429 Training: 2022-04-11 04:51:01,439-[agedb_30][158000]XNorm: 8.523225 Training: 2022-04-11 04:51:01,440-[agedb_30][158000]Accuracy-Flip: 0.96533+-0.00878 Training: 2022-04-11 04:51:01,440-[agedb_30][158000]Accuracy-Highest: 0.96900 Training: 2022-04-11 04:51:02,350-Speed 150.98 samples/sec Loss 5.2688 LearningRate 0.0093 Epoch: 27 Global Step: 158010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:03,257-Speed 11288.88 samples/sec Loss 5.2154 LearningRate 0.0093 Epoch: 27 Global Step: 158020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:04,191-Speed 10972.09 samples/sec Loss 5.1945 LearningRate 0.0093 Epoch: 27 Global Step: 158030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:05,100-Speed 11267.84 samples/sec Loss 5.2512 LearningRate 0.0093 Epoch: 27 Global Step: 158040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:05,996-Speed 11445.18 samples/sec Loss 5.2885 LearningRate 0.0093 Epoch: 27 Global Step: 158050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:06,913-Speed 11175.93 samples/sec Loss 5.1608 LearningRate 0.0093 Epoch: 27 Global Step: 158060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:07,864-Speed 10771.90 samples/sec Loss 5.2261 LearningRate 0.0093 Epoch: 27 Global Step: 158070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:08,763-Speed 11402.91 samples/sec Loss 5.2590 LearningRate 0.0093 Epoch: 27 Global Step: 158080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:09,639-Speed 11691.26 samples/sec Loss 5.2202 LearningRate 0.0093 Epoch: 27 Global Step: 158090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:10,525-Speed 11572.63 samples/sec Loss 5.3159 LearningRate 0.0093 Epoch: 27 Global Step: 158100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:11,406-Speed 11629.91 samples/sec Loss 5.2185 LearningRate 0.0093 Epoch: 27 Global Step: 158110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:12,328-Speed 11107.05 samples/sec Loss 5.2362 LearningRate 0.0093 Epoch: 27 Global Step: 158120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:13,268-Speed 10905.50 samples/sec Loss 5.3099 LearningRate 0.0093 Epoch: 27 Global Step: 158130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:14,174-Speed 11311.22 samples/sec Loss 5.2656 LearningRate 0.0093 Epoch: 27 Global Step: 158140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:15,097-Speed 11100.26 samples/sec Loss 5.1232 LearningRate 0.0093 Epoch: 27 Global Step: 158150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:16,010-Speed 11225.20 samples/sec Loss 5.2797 LearningRate 0.0093 Epoch: 27 Global Step: 158160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:16,897-Speed 11561.32 samples/sec Loss 5.2833 LearningRate 0.0093 Epoch: 27 Global Step: 158170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:17,825-Speed 11041.47 samples/sec Loss 5.2725 LearningRate 0.0093 Epoch: 27 Global Step: 158180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:18,737-Speed 11226.54 samples/sec Loss 5.3695 LearningRate 0.0093 Epoch: 27 Global Step: 158190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:19,653-Speed 11198.88 samples/sec Loss 5.2690 LearningRate 0.0093 Epoch: 27 Global Step: 158200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:20,551-Speed 11410.12 samples/sec Loss 5.2552 LearningRate 0.0093 Epoch: 27 Global Step: 158210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:21,447-Speed 11434.27 samples/sec Loss 5.2856 LearningRate 0.0093 Epoch: 27 Global Step: 158220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:22,372-Speed 11086.37 samples/sec Loss 5.3508 LearningRate 0.0093 Epoch: 27 Global Step: 158230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:23,271-Speed 11394.53 samples/sec Loss 5.2597 LearningRate 0.0093 Epoch: 27 Global Step: 158240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:24,189-Speed 11161.49 samples/sec Loss 5.2855 LearningRate 0.0093 Epoch: 27 Global Step: 158250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:25,073-Speed 11584.91 samples/sec Loss 5.2428 LearningRate 0.0093 Epoch: 27 Global Step: 158260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:26,004-Speed 11015.26 samples/sec Loss 5.3144 LearningRate 0.0092 Epoch: 27 Global Step: 158270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:26,940-Speed 10953.90 samples/sec Loss 5.1046 LearningRate 0.0092 Epoch: 27 Global Step: 158280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:27,847-Speed 11297.82 samples/sec Loss 5.3144 LearningRate 0.0092 Epoch: 27 Global Step: 158290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:28,761-Speed 11205.80 samples/sec Loss 5.4060 LearningRate 0.0092 Epoch: 27 Global Step: 158300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:29,723-Speed 10650.30 samples/sec Loss 5.2616 LearningRate 0.0092 Epoch: 27 Global Step: 158310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:30,646-Speed 11106.20 samples/sec Loss 5.2037 LearningRate 0.0092 Epoch: 27 Global Step: 158320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:31,567-Speed 11129.84 samples/sec Loss 5.3547 LearningRate 0.0092 Epoch: 27 Global Step: 158330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:32,458-Speed 11498.59 samples/sec Loss 5.3980 LearningRate 0.0092 Epoch: 27 Global Step: 158340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:33,341-Speed 11605.47 samples/sec Loss 5.2828 LearningRate 0.0092 Epoch: 27 Global Step: 158350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:34,221-Speed 11643.48 samples/sec Loss 5.3163 LearningRate 0.0092 Epoch: 27 Global Step: 158360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:35,114-Speed 11487.21 samples/sec Loss 5.2528 LearningRate 0.0092 Epoch: 27 Global Step: 158370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:36,017-Speed 11344.33 samples/sec Loss 5.2906 LearningRate 0.0092 Epoch: 27 Global Step: 158380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:36,960-Speed 10870.69 samples/sec Loss 5.2146 LearningRate 0.0092 Epoch: 27 Global Step: 158390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:51:37,921-Speed 10664.03 samples/sec Loss 5.4697 LearningRate 0.0092 Epoch: 27 Global Step: 158400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:51:38,838-Speed 11173.53 samples/sec Loss 5.0383 LearningRate 0.0092 Epoch: 27 Global Step: 158410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:39,742-Speed 11331.14 samples/sec Loss 5.3070 LearningRate 0.0092 Epoch: 27 Global Step: 158420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:40,660-Speed 11166.28 samples/sec Loss 5.3200 LearningRate 0.0092 Epoch: 27 Global Step: 158430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:41,606-Speed 10835.74 samples/sec Loss 5.3849 LearningRate 0.0092 Epoch: 27 Global Step: 158440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:42,507-Speed 11376.59 samples/sec Loss 5.2410 LearningRate 0.0092 Epoch: 27 Global Step: 158450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:43,413-Speed 11305.63 samples/sec Loss 5.2758 LearningRate 0.0092 Epoch: 27 Global Step: 158460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:44,318-Speed 11324.68 samples/sec Loss 5.1702 LearningRate 0.0092 Epoch: 27 Global Step: 158470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:45,208-Speed 11511.39 samples/sec Loss 5.3066 LearningRate 0.0092 Epoch: 27 Global Step: 158480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:46,093-Speed 11574.07 samples/sec Loss 5.4338 LearningRate 0.0092 Epoch: 27 Global Step: 158490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:46,984-Speed 11514.98 samples/sec Loss 5.2480 LearningRate 0.0092 Epoch: 27 Global Step: 158500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:47,930-Speed 10831.99 samples/sec Loss 5.3585 LearningRate 0.0092 Epoch: 27 Global Step: 158510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:51:48,843-Speed 11226.55 samples/sec Loss 5.1834 LearningRate 0.0092 Epoch: 27 Global Step: 158520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:49,802-Speed 10688.23 samples/sec Loss 5.0974 LearningRate 0.0092 Epoch: 27 Global Step: 158530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:50,725-Speed 11106.34 samples/sec Loss 5.3064 LearningRate 0.0092 Epoch: 27 Global Step: 158540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:51,631-Speed 11309.48 samples/sec Loss 5.2028 LearningRate 0.0092 Epoch: 27 Global Step: 158550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:51:52,542-Speed 11249.46 samples/sec Loss 5.2922 LearningRate 0.0092 Epoch: 27 Global Step: 158560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:53,446-Speed 11339.42 samples/sec Loss 5.2850 LearningRate 0.0092 Epoch: 27 Global Step: 158570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:54,365-Speed 11142.75 samples/sec Loss 5.2973 LearningRate 0.0092 Epoch: 27 Global Step: 158580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:55,269-Speed 11343.32 samples/sec Loss 5.4167 LearningRate 0.0092 Epoch: 27 Global Step: 158590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:56,170-Speed 11376.68 samples/sec Loss 5.3130 LearningRate 0.0092 Epoch: 27 Global Step: 158600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:57,061-Speed 11500.42 samples/sec Loss 5.2597 LearningRate 0.0092 Epoch: 27 Global Step: 158610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:57,955-Speed 11458.12 samples/sec Loss 5.3659 LearningRate 0.0092 Epoch: 27 Global Step: 158620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:58,852-Speed 11418.08 samples/sec Loss 5.1952 LearningRate 0.0092 Epoch: 27 Global Step: 158630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:51:59,756-Speed 11333.06 samples/sec Loss 5.2596 LearningRate 0.0092 Epoch: 27 Global Step: 158640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:00,734-Speed 10486.85 samples/sec Loss 5.3700 LearningRate 0.0091 Epoch: 27 Global Step: 158650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:01,659-Speed 11074.72 samples/sec Loss 5.3258 LearningRate 0.0091 Epoch: 27 Global Step: 158660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:02,572-Speed 11228.88 samples/sec Loss 5.2290 LearningRate 0.0091 Epoch: 27 Global Step: 158670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:03,492-Speed 11137.82 samples/sec Loss 5.4416 LearningRate 0.0091 Epoch: 27 Global Step: 158680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:04,413-Speed 11122.08 samples/sec Loss 5.0997 LearningRate 0.0091 Epoch: 27 Global Step: 158690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:05,362-Speed 10795.81 samples/sec Loss 5.2825 LearningRate 0.0091 Epoch: 27 Global Step: 158700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:06,286-Speed 11095.25 samples/sec Loss 5.3087 LearningRate 0.0091 Epoch: 27 Global Step: 158710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:07,194-Speed 11281.32 samples/sec Loss 5.3381 LearningRate 0.0091 Epoch: 27 Global Step: 158720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:08,095-Speed 11381.09 samples/sec Loss 5.3422 LearningRate 0.0091 Epoch: 27 Global Step: 158730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:09,012-Speed 11164.78 samples/sec Loss 5.2532 LearningRate 0.0091 Epoch: 27 Global Step: 158740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:09,912-Speed 11396.47 samples/sec Loss 5.1625 LearningRate 0.0091 Epoch: 27 Global Step: 158750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:10,808-Speed 11438.62 samples/sec Loss 5.2793 LearningRate 0.0091 Epoch: 27 Global Step: 158760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:11,721-Speed 11220.35 samples/sec Loss 5.2316 LearningRate 0.0091 Epoch: 27 Global Step: 158770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:12,603-Speed 11616.98 samples/sec Loss 5.3618 LearningRate 0.0091 Epoch: 27 Global Step: 158780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:13,514-Speed 11250.80 samples/sec Loss 5.3093 LearningRate 0.0091 Epoch: 27 Global Step: 158790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:14,433-Speed 11153.28 samples/sec Loss 5.3511 LearningRate 0.0091 Epoch: 27 Global Step: 158800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:15,321-Speed 11534.21 samples/sec Loss 5.2666 LearningRate 0.0091 Epoch: 27 Global Step: 158810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:16,248-Speed 11060.61 samples/sec Loss 5.1376 LearningRate 0.0091 Epoch: 27 Global Step: 158820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:17,178-Speed 11023.68 samples/sec Loss 5.2955 LearningRate 0.0091 Epoch: 27 Global Step: 158830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:18,128-Speed 10777.75 samples/sec Loss 5.3130 LearningRate 0.0091 Epoch: 27 Global Step: 158840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:19,020-Speed 11488.62 samples/sec Loss 5.2832 LearningRate 0.0091 Epoch: 27 Global Step: 158850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:19,950-Speed 11020.26 samples/sec Loss 5.2283 LearningRate 0.0091 Epoch: 27 Global Step: 158860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:52:20,854-Speed 11334.69 samples/sec Loss 5.2423 LearningRate 0.0091 Epoch: 27 Global Step: 158870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:21,748-Speed 11472.78 samples/sec Loss 5.3243 LearningRate 0.0091 Epoch: 27 Global Step: 158880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:22,629-Speed 11628.66 samples/sec Loss 5.3111 LearningRate 0.0091 Epoch: 27 Global Step: 158890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:23,561-Speed 10993.80 samples/sec Loss 5.3501 LearningRate 0.0091 Epoch: 27 Global Step: 158900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:24,467-Speed 11304.51 samples/sec Loss 5.2090 LearningRate 0.0091 Epoch: 27 Global Step: 158910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:25,373-Speed 11317.60 samples/sec Loss 5.2335 LearningRate 0.0091 Epoch: 27 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:26,297-Speed 11094.47 samples/sec Loss 5.2590 LearningRate 0.0091 Epoch: 27 Global Step: 158930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:27,207-Speed 11259.12 samples/sec Loss 5.3121 LearningRate 0.0091 Epoch: 27 Global Step: 158940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:28,088-Speed 11620.33 samples/sec Loss 5.3231 LearningRate 0.0091 Epoch: 27 Global Step: 158950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:29,023-Speed 10962.22 samples/sec Loss 5.2185 LearningRate 0.0091 Epoch: 27 Global Step: 158960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:29,956-Speed 10988.80 samples/sec Loss 5.3063 LearningRate 0.0091 Epoch: 27 Global Step: 158970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:30,856-Speed 11386.14 samples/sec Loss 5.2898 LearningRate 0.0091 Epoch: 27 Global Step: 158980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:31,793-Speed 10928.40 samples/sec Loss 5.2672 LearningRate 0.0091 Epoch: 27 Global Step: 158990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:32,724-Speed 11012.11 samples/sec Loss 5.2029 LearningRate 0.0091 Epoch: 27 Global Step: 159000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:33,605-Speed 11636.33 samples/sec Loss 5.2400 LearningRate 0.0091 Epoch: 27 Global Step: 159010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:34,491-Speed 11563.43 samples/sec Loss 5.3920 LearningRate 0.0090 Epoch: 27 Global Step: 159020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:35,386-Speed 11451.30 samples/sec Loss 5.2230 LearningRate 0.0090 Epoch: 27 Global Step: 159030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:36,289-Speed 11349.25 samples/sec Loss 5.2992 LearningRate 0.0090 Epoch: 27 Global Step: 159040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:37,224-Speed 10963.55 samples/sec Loss 5.2482 LearningRate 0.0090 Epoch: 27 Global Step: 159050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:38,097-Speed 11740.04 samples/sec Loss 5.2514 LearningRate 0.0090 Epoch: 27 Global Step: 159060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:39,043-Speed 10820.63 samples/sec Loss 5.3163 LearningRate 0.0090 Epoch: 27 Global Step: 159070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:39,973-Speed 11017.49 samples/sec Loss 5.2669 LearningRate 0.0090 Epoch: 27 Global Step: 159080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:40,908-Speed 10971.52 samples/sec Loss 5.3412 LearningRate 0.0090 Epoch: 27 Global Step: 159090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:41,866-Speed 10686.87 samples/sec Loss 5.3034 LearningRate 0.0090 Epoch: 27 Global Step: 159100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:42,778-Speed 11238.39 samples/sec Loss 5.3646 LearningRate 0.0090 Epoch: 27 Global Step: 159110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:43,715-Speed 10943.02 samples/sec Loss 5.2470 LearningRate 0.0090 Epoch: 27 Global Step: 159120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:44,594-Speed 11654.36 samples/sec Loss 5.3265 LearningRate 0.0090 Epoch: 27 Global Step: 159130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:52:45,496-Speed 11356.80 samples/sec Loss 5.3872 LearningRate 0.0090 Epoch: 27 Global Step: 159140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:46,391-Speed 11448.35 samples/sec Loss 5.3091 LearningRate 0.0090 Epoch: 27 Global Step: 159150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:47,268-Speed 11689.66 samples/sec Loss 5.3036 LearningRate 0.0090 Epoch: 27 Global Step: 159160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:48,188-Speed 11134.01 samples/sec Loss 5.2027 LearningRate 0.0090 Epoch: 27 Global Step: 159170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:49,117-Speed 11030.10 samples/sec Loss 5.3480 LearningRate 0.0090 Epoch: 27 Global Step: 159180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:50,025-Speed 11288.34 samples/sec Loss 5.3064 LearningRate 0.0090 Epoch: 27 Global Step: 159190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:52:50,980-Speed 10731.31 samples/sec Loss 5.3167 LearningRate 0.0090 Epoch: 27 Global Step: 159200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:02,305-Speed 904.31 samples/sec Loss 5.3141 LearningRate 0.0090 Epoch: 28 Global Step: 159210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:03,519-Speed 8437.69 samples/sec Loss 4.7436 LearningRate 0.0090 Epoch: 28 Global Step: 159220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:04,515-Speed 10294.81 samples/sec Loss 4.6462 LearningRate 0.0090 Epoch: 28 Global Step: 159230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:05,455-Speed 10897.40 samples/sec Loss 4.7328 LearningRate 0.0090 Epoch: 28 Global Step: 159240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:53:06,438-Speed 10428.43 samples/sec Loss 4.5109 LearningRate 0.0090 Epoch: 28 Global Step: 159250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:53:07,348-Speed 11262.56 samples/sec Loss 4.6171 LearningRate 0.0090 Epoch: 28 Global Step: 159260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:08,326-Speed 10471.29 samples/sec Loss 4.7824 LearningRate 0.0090 Epoch: 28 Global Step: 159270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:09,295-Speed 10582.77 samples/sec Loss 4.8233 LearningRate 0.0090 Epoch: 28 Global Step: 159280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:10,209-Speed 11213.21 samples/sec Loss 4.5298 LearningRate 0.0090 Epoch: 28 Global Step: 159290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:11,119-Speed 11260.40 samples/sec Loss 4.6217 LearningRate 0.0090 Epoch: 28 Global Step: 159300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:12,082-Speed 10634.34 samples/sec Loss 4.6807 LearningRate 0.0090 Epoch: 28 Global Step: 159310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:12,992-Speed 11258.98 samples/sec Loss 4.6814 LearningRate 0.0090 Epoch: 28 Global Step: 159320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:13,935-Speed 10871.96 samples/sec Loss 4.8416 LearningRate 0.0090 Epoch: 28 Global Step: 159330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:14,927-Speed 10338.02 samples/sec Loss 4.7285 LearningRate 0.0090 Epoch: 28 Global Step: 159340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:15,859-Speed 10994.26 samples/sec Loss 4.7795 LearningRate 0.0090 Epoch: 28 Global Step: 159350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:16,787-Speed 11040.03 samples/sec Loss 4.7176 LearningRate 0.0090 Epoch: 28 Global Step: 159360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:17,730-Speed 10865.71 samples/sec Loss 4.5985 LearningRate 0.0090 Epoch: 28 Global Step: 159370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:18,714-Speed 10418.48 samples/sec Loss 4.7629 LearningRate 0.0090 Epoch: 28 Global Step: 159380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:19,677-Speed 10637.75 samples/sec Loss 4.7012 LearningRate 0.0090 Epoch: 28 Global Step: 159390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:20,612-Speed 10953.54 samples/sec Loss 4.7258 LearningRate 0.0089 Epoch: 28 Global Step: 159400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:21,508-Speed 11445.04 samples/sec Loss 4.7732 LearningRate 0.0089 Epoch: 28 Global Step: 159410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:22,417-Speed 11272.38 samples/sec Loss 4.7161 LearningRate 0.0089 Epoch: 28 Global Step: 159420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:23,334-Speed 11172.51 samples/sec Loss 4.7931 LearningRate 0.0089 Epoch: 28 Global Step: 159430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:24,264-Speed 11023.98 samples/sec Loss 4.7097 LearningRate 0.0089 Epoch: 28 Global Step: 159440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:25,203-Speed 10917.48 samples/sec Loss 4.7997 LearningRate 0.0089 Epoch: 28 Global Step: 159450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:26,126-Speed 11096.20 samples/sec Loss 4.7222 LearningRate 0.0089 Epoch: 28 Global Step: 159460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:27,070-Speed 10857.46 samples/sec Loss 4.6572 LearningRate 0.0089 Epoch: 28 Global Step: 159470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:28,033-Speed 10633.20 samples/sec Loss 4.8009 LearningRate 0.0089 Epoch: 28 Global Step: 159480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:28,990-Speed 10715.51 samples/sec Loss 4.8561 LearningRate 0.0089 Epoch: 28 Global Step: 159490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:29,979-Speed 10359.06 samples/sec Loss 4.7640 LearningRate 0.0089 Epoch: 28 Global Step: 159500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:30,926-Speed 10820.09 samples/sec Loss 4.5562 LearningRate 0.0089 Epoch: 28 Global Step: 159510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:31,846-Speed 11132.59 samples/sec Loss 4.7461 LearningRate 0.0089 Epoch: 28 Global Step: 159520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:32,747-Speed 11396.55 samples/sec Loss 4.6779 LearningRate 0.0089 Epoch: 28 Global Step: 159530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:33,634-Speed 11539.70 samples/sec Loss 4.7002 LearningRate 0.0089 Epoch: 28 Global Step: 159540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:34,526-Speed 11492.46 samples/sec Loss 4.7723 LearningRate 0.0089 Epoch: 28 Global Step: 159550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:35,424-Speed 11414.69 samples/sec Loss 4.7988 LearningRate 0.0089 Epoch: 28 Global Step: 159560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:36,328-Speed 11331.75 samples/sec Loss 4.7519 LearningRate 0.0089 Epoch: 28 Global Step: 159570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:37,231-Speed 11346.90 samples/sec Loss 4.7152 LearningRate 0.0089 Epoch: 28 Global Step: 159580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:53:38,124-Speed 11482.23 samples/sec Loss 4.7360 LearningRate 0.0089 Epoch: 28 Global Step: 159590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:39,082-Speed 10698.54 samples/sec Loss 4.6773 LearningRate 0.0089 Epoch: 28 Global Step: 159600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:40,013-Speed 10999.92 samples/sec Loss 4.7967 LearningRate 0.0089 Epoch: 28 Global Step: 159610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:40,951-Speed 10925.11 samples/sec Loss 4.7549 LearningRate 0.0089 Epoch: 28 Global Step: 159620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:41,863-Speed 11244.79 samples/sec Loss 4.7007 LearningRate 0.0089 Epoch: 28 Global Step: 159630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:42,761-Speed 11403.83 samples/sec Loss 4.8174 LearningRate 0.0089 Epoch: 28 Global Step: 159640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:43,703-Speed 10877.17 samples/sec Loss 4.6403 LearningRate 0.0089 Epoch: 28 Global Step: 159650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:44,612-Speed 11278.87 samples/sec Loss 4.7066 LearningRate 0.0089 Epoch: 28 Global Step: 159660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:45,506-Speed 11460.34 samples/sec Loss 4.8649 LearningRate 0.0089 Epoch: 28 Global Step: 159670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:46,415-Speed 11275.14 samples/sec Loss 4.7336 LearningRate 0.0089 Epoch: 28 Global Step: 159680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:47,301-Speed 11566.86 samples/sec Loss 4.8585 LearningRate 0.0089 Epoch: 28 Global Step: 159690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:48,202-Speed 11376.95 samples/sec Loss 4.8602 LearningRate 0.0089 Epoch: 28 Global Step: 159700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:49,131-Speed 11027.41 samples/sec Loss 4.7446 LearningRate 0.0089 Epoch: 28 Global Step: 159710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:50,064-Speed 10979.21 samples/sec Loss 4.6871 LearningRate 0.0089 Epoch: 28 Global Step: 159720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:50,990-Speed 11063.50 samples/sec Loss 4.8009 LearningRate 0.0089 Epoch: 28 Global Step: 159730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:51,934-Speed 10856.74 samples/sec Loss 4.6315 LearningRate 0.0089 Epoch: 28 Global Step: 159740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:52,833-Speed 11402.11 samples/sec Loss 4.8138 LearningRate 0.0089 Epoch: 28 Global Step: 159750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:53,772-Speed 10907.94 samples/sec Loss 4.7988 LearningRate 0.0089 Epoch: 28 Global Step: 159760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:54,692-Speed 11141.94 samples/sec Loss 4.8664 LearningRate 0.0089 Epoch: 28 Global Step: 159770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:55,670-Speed 10478.01 samples/sec Loss 4.8338 LearningRate 0.0088 Epoch: 28 Global Step: 159780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:56,602-Speed 11001.97 samples/sec Loss 4.9105 LearningRate 0.0088 Epoch: 28 Global Step: 159790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:57,506-Speed 11326.79 samples/sec Loss 4.8488 LearningRate 0.0088 Epoch: 28 Global Step: 159800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:58,409-Speed 11350.44 samples/sec Loss 4.7461 LearningRate 0.0088 Epoch: 28 Global Step: 159810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:53:59,323-Speed 11211.56 samples/sec Loss 4.8049 LearningRate 0.0088 Epoch: 28 Global Step: 159820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:00,266-Speed 10863.10 samples/sec Loss 4.8716 LearningRate 0.0088 Epoch: 28 Global Step: 159830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:01,201-Speed 10968.48 samples/sec Loss 4.7527 LearningRate 0.0088 Epoch: 28 Global Step: 159840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:02,137-Speed 10952.61 samples/sec Loss 4.8506 LearningRate 0.0088 Epoch: 28 Global Step: 159850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:03,035-Speed 11406.47 samples/sec Loss 4.8347 LearningRate 0.0088 Epoch: 28 Global Step: 159860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:03,946-Speed 11244.49 samples/sec Loss 4.7323 LearningRate 0.0088 Epoch: 28 Global Step: 159870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:04,877-Speed 11008.07 samples/sec Loss 4.8854 LearningRate 0.0088 Epoch: 28 Global Step: 159880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:05,808-Speed 11010.40 samples/sec Loss 4.8233 LearningRate 0.0088 Epoch: 28 Global Step: 159890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:06,734-Speed 11071.52 samples/sec Loss 4.7666 LearningRate 0.0088 Epoch: 28 Global Step: 159900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:07,666-Speed 10990.70 samples/sec Loss 4.8421 LearningRate 0.0088 Epoch: 28 Global Step: 159910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:08,616-Speed 10786.16 samples/sec Loss 4.9164 LearningRate 0.0088 Epoch: 28 Global Step: 159920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:09,525-Speed 11270.48 samples/sec Loss 4.8136 LearningRate 0.0088 Epoch: 28 Global Step: 159930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:54:10,403-Speed 11671.44 samples/sec Loss 4.7216 LearningRate 0.0088 Epoch: 28 Global Step: 159940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:11,303-Speed 11388.83 samples/sec Loss 4.8038 LearningRate 0.0088 Epoch: 28 Global Step: 159950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:12,263-Speed 10675.80 samples/sec Loss 4.8822 LearningRate 0.0088 Epoch: 28 Global Step: 159960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:13,220-Speed 10705.88 samples/sec Loss 4.8193 LearningRate 0.0088 Epoch: 28 Global Step: 159970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:14,162-Speed 10871.97 samples/sec Loss 4.7833 LearningRate 0.0088 Epoch: 28 Global Step: 159980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:15,109-Speed 10821.60 samples/sec Loss 4.9940 LearningRate 0.0088 Epoch: 28 Global Step: 159990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:16,073-Speed 10627.58 samples/sec Loss 4.8315 LearningRate 0.0088 Epoch: 28 Global Step: 160000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:54:37,297-[lfw][160000]XNorm: 8.673240 Training: 2022-04-11 04:54:37,297-[lfw][160000]Accuracy-Flip: 0.99683+-0.00302 Training: 2022-04-11 04:54:37,298-[lfw][160000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:55:01,930-[cfp_fp][160000]XNorm: 7.339604 Training: 2022-04-11 04:55:01,931-[cfp_fp][160000]Accuracy-Flip: 0.95114+-0.01110 Training: 2022-04-11 04:55:01,931-[cfp_fp][160000]Accuracy-Highest: 0.95429 Training: 2022-04-11 04:55:23,187-[agedb_30][160000]XNorm: 8.498991 Training: 2022-04-11 04:55:23,187-[agedb_30][160000]Accuracy-Flip: 0.96650+-0.00589 Training: 2022-04-11 04:55:23,188-[agedb_30][160000]Accuracy-Highest: 0.96900 Training: 2022-04-11 04:55:24,090-Speed 150.55 samples/sec Loss 4.8432 LearningRate 0.0088 Epoch: 28 Global Step: 160010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:24,990-Speed 11387.08 samples/sec Loss 4.9354 LearningRate 0.0088 Epoch: 28 Global Step: 160020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:25,943-Speed 10748.59 samples/sec Loss 4.9313 LearningRate 0.0088 Epoch: 28 Global Step: 160030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:26,899-Speed 10721.79 samples/sec Loss 4.8709 LearningRate 0.0088 Epoch: 28 Global Step: 160040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:27,831-Speed 10992.29 samples/sec Loss 4.9694 LearningRate 0.0088 Epoch: 28 Global Step: 160050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:28,769-Speed 10921.27 samples/sec Loss 4.9041 LearningRate 0.0088 Epoch: 28 Global Step: 160060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:29,705-Speed 10949.40 samples/sec Loss 4.8113 LearningRate 0.0088 Epoch: 28 Global Step: 160070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:30,637-Speed 10997.87 samples/sec Loss 4.8647 LearningRate 0.0088 Epoch: 28 Global Step: 160080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:31,598-Speed 10665.13 samples/sec Loss 4.8233 LearningRate 0.0088 Epoch: 28 Global Step: 160090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:32,517-Speed 11153.12 samples/sec Loss 4.8008 LearningRate 0.0088 Epoch: 28 Global Step: 160100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:33,417-Speed 11386.30 samples/sec Loss 4.8148 LearningRate 0.0088 Epoch: 28 Global Step: 160110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:34,336-Speed 11150.73 samples/sec Loss 5.0304 LearningRate 0.0088 Epoch: 28 Global Step: 160120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:35,260-Speed 11094.46 samples/sec Loss 4.9661 LearningRate 0.0088 Epoch: 28 Global Step: 160130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:36,188-Speed 11040.82 samples/sec Loss 4.8480 LearningRate 0.0088 Epoch: 28 Global Step: 160140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:37,116-Speed 11039.20 samples/sec Loss 4.9658 LearningRate 0.0088 Epoch: 28 Global Step: 160150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:38,078-Speed 10649.14 samples/sec Loss 4.9269 LearningRate 0.0088 Epoch: 28 Global Step: 160160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:39,033-Speed 10734.33 samples/sec Loss 4.7530 LearningRate 0.0087 Epoch: 28 Global Step: 160170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:39,999-Speed 10603.59 samples/sec Loss 4.9255 LearningRate 0.0087 Epoch: 28 Global Step: 160180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:40,930-Speed 11014.60 samples/sec Loss 4.9594 LearningRate 0.0087 Epoch: 28 Global Step: 160190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:55:41,870-Speed 10894.55 samples/sec Loss 4.9677 LearningRate 0.0087 Epoch: 28 Global Step: 160200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:42,806-Speed 10948.32 samples/sec Loss 4.8981 LearningRate 0.0087 Epoch: 28 Global Step: 160210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:43,735-Speed 11033.66 samples/sec Loss 4.8696 LearningRate 0.0087 Epoch: 28 Global Step: 160220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:44,658-Speed 11097.44 samples/sec Loss 4.8847 LearningRate 0.0087 Epoch: 28 Global Step: 160230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:45,561-Speed 11345.62 samples/sec Loss 4.8366 LearningRate 0.0087 Epoch: 28 Global Step: 160240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:46,457-Speed 11441.33 samples/sec Loss 4.8334 LearningRate 0.0087 Epoch: 28 Global Step: 160250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:47,388-Speed 11011.57 samples/sec Loss 4.6802 LearningRate 0.0087 Epoch: 28 Global Step: 160260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:48,293-Speed 11320.80 samples/sec Loss 4.8456 LearningRate 0.0087 Epoch: 28 Global Step: 160270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:49,207-Speed 11211.63 samples/sec Loss 4.8328 LearningRate 0.0087 Epoch: 28 Global Step: 160280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:50,139-Speed 10993.19 samples/sec Loss 4.8703 LearningRate 0.0087 Epoch: 28 Global Step: 160290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:51,051-Speed 11240.21 samples/sec Loss 4.8776 LearningRate 0.0087 Epoch: 28 Global Step: 160300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:55:51,951-Speed 11381.97 samples/sec Loss 4.8487 LearningRate 0.0087 Epoch: 28 Global Step: 160310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:52,869-Speed 11167.88 samples/sec Loss 4.8580 LearningRate 0.0087 Epoch: 28 Global Step: 160320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:53,799-Speed 11006.64 samples/sec Loss 4.9483 LearningRate 0.0087 Epoch: 28 Global Step: 160330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:54,744-Speed 10852.99 samples/sec Loss 4.8388 LearningRate 0.0087 Epoch: 28 Global Step: 160340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:55,676-Speed 10985.15 samples/sec Loss 4.9680 LearningRate 0.0087 Epoch: 28 Global Step: 160350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:56,608-Speed 11000.11 samples/sec Loss 5.0939 LearningRate 0.0087 Epoch: 28 Global Step: 160360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:57,524-Speed 11193.73 samples/sec Loss 4.8343 LearningRate 0.0087 Epoch: 28 Global Step: 160370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:58,466-Speed 10874.41 samples/sec Loss 4.8592 LearningRate 0.0087 Epoch: 28 Global Step: 160380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:55:59,394-Speed 11044.21 samples/sec Loss 5.0384 LearningRate 0.0087 Epoch: 28 Global Step: 160390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:00,308-Speed 11208.79 samples/sec Loss 4.9482 LearningRate 0.0087 Epoch: 28 Global Step: 160400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:01,232-Speed 11080.19 samples/sec Loss 4.7978 LearningRate 0.0087 Epoch: 28 Global Step: 160410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:02,152-Speed 11145.86 samples/sec Loss 5.0066 LearningRate 0.0087 Epoch: 28 Global Step: 160420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:03,092-Speed 10912.27 samples/sec Loss 5.0151 LearningRate 0.0087 Epoch: 28 Global Step: 160430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:04,033-Speed 10887.76 samples/sec Loss 4.9166 LearningRate 0.0087 Epoch: 28 Global Step: 160440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:04,991-Speed 10692.15 samples/sec Loss 4.8808 LearningRate 0.0087 Epoch: 28 Global Step: 160450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:05,939-Speed 10809.94 samples/sec Loss 4.7465 LearningRate 0.0087 Epoch: 28 Global Step: 160460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:06,877-Speed 10930.47 samples/sec Loss 5.0419 LearningRate 0.0087 Epoch: 28 Global Step: 160470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:07,841-Speed 10628.23 samples/sec Loss 5.0225 LearningRate 0.0087 Epoch: 28 Global Step: 160480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:08,800-Speed 10682.95 samples/sec Loss 4.9024 LearningRate 0.0087 Epoch: 28 Global Step: 160490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:09,712-Speed 11237.01 samples/sec Loss 4.8968 LearningRate 0.0087 Epoch: 28 Global Step: 160500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:10,628-Speed 11186.47 samples/sec Loss 4.9910 LearningRate 0.0087 Epoch: 28 Global Step: 160510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:56:11,523-Speed 11455.05 samples/sec Loss 4.8436 LearningRate 0.0087 Epoch: 28 Global Step: 160520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:12,470-Speed 10815.55 samples/sec Loss 4.9939 LearningRate 0.0087 Epoch: 28 Global Step: 160530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:13,445-Speed 10512.63 samples/sec Loss 4.9866 LearningRate 0.0087 Epoch: 28 Global Step: 160540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:14,419-Speed 10524.86 samples/sec Loss 4.9911 LearningRate 0.0086 Epoch: 28 Global Step: 160550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:15,356-Speed 10928.11 samples/sec Loss 4.8410 LearningRate 0.0086 Epoch: 28 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:16,303-Speed 10823.65 samples/sec Loss 4.9460 LearningRate 0.0086 Epoch: 28 Global Step: 160570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:17,258-Speed 10731.47 samples/sec Loss 4.9462 LearningRate 0.0086 Epoch: 28 Global Step: 160580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:18,199-Speed 10883.19 samples/sec Loss 4.8979 LearningRate 0.0086 Epoch: 28 Global Step: 160590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:19,166-Speed 10595.08 samples/sec Loss 4.9965 LearningRate 0.0086 Epoch: 28 Global Step: 160600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:20,131-Speed 10619.18 samples/sec Loss 5.0986 LearningRate 0.0086 Epoch: 28 Global Step: 160610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:21,064-Speed 10985.04 samples/sec Loss 4.9486 LearningRate 0.0086 Epoch: 28 Global Step: 160620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:21,986-Speed 11112.30 samples/sec Loss 4.9769 LearningRate 0.0086 Epoch: 28 Global Step: 160630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:22,882-Speed 11448.30 samples/sec Loss 4.8282 LearningRate 0.0086 Epoch: 28 Global Step: 160640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:23,795-Speed 11226.62 samples/sec Loss 4.9706 LearningRate 0.0086 Epoch: 28 Global Step: 160650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:24,780-Speed 10399.24 samples/sec Loss 4.9896 LearningRate 0.0086 Epoch: 28 Global Step: 160660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:25,679-Speed 11400.40 samples/sec Loss 4.8974 LearningRate 0.0086 Epoch: 28 Global Step: 160670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:26,604-Speed 11073.39 samples/sec Loss 4.9277 LearningRate 0.0086 Epoch: 28 Global Step: 160680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:27,535-Speed 11011.67 samples/sec Loss 5.0418 LearningRate 0.0086 Epoch: 28 Global Step: 160690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:28,492-Speed 10711.55 samples/sec Loss 5.0469 LearningRate 0.0086 Epoch: 28 Global Step: 160700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:29,442-Speed 10781.66 samples/sec Loss 4.9296 LearningRate 0.0086 Epoch: 28 Global Step: 160710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:30,407-Speed 10623.49 samples/sec Loss 4.7612 LearningRate 0.0086 Epoch: 28 Global Step: 160720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:31,361-Speed 10738.24 samples/sec Loss 4.9887 LearningRate 0.0086 Epoch: 28 Global Step: 160730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:32,261-Speed 11378.74 samples/sec Loss 4.9033 LearningRate 0.0086 Epoch: 28 Global Step: 160740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:33,173-Speed 11235.05 samples/sec Loss 4.8689 LearningRate 0.0086 Epoch: 28 Global Step: 160750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:34,075-Speed 11361.39 samples/sec Loss 5.0197 LearningRate 0.0086 Epoch: 28 Global Step: 160760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:34,991-Speed 11192.89 samples/sec Loss 4.8894 LearningRate 0.0086 Epoch: 28 Global Step: 160770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:56:35,886-Speed 11450.99 samples/sec Loss 4.7186 LearningRate 0.0086 Epoch: 28 Global Step: 160780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:36,791-Speed 11324.43 samples/sec Loss 4.9943 LearningRate 0.0086 Epoch: 28 Global Step: 160790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:37,714-Speed 11105.07 samples/sec Loss 4.9349 LearningRate 0.0086 Epoch: 28 Global Step: 160800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:38,633-Speed 11147.84 samples/sec Loss 5.0259 LearningRate 0.0086 Epoch: 28 Global Step: 160810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:39,595-Speed 10651.78 samples/sec Loss 5.0044 LearningRate 0.0086 Epoch: 28 Global Step: 160820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:40,530-Speed 10954.93 samples/sec Loss 4.9779 LearningRate 0.0086 Epoch: 28 Global Step: 160830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:41,497-Speed 10601.37 samples/sec Loss 4.8793 LearningRate 0.0086 Epoch: 28 Global Step: 160840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:42,448-Speed 10768.78 samples/sec Loss 4.8736 LearningRate 0.0086 Epoch: 28 Global Step: 160850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:43,373-Speed 11088.55 samples/sec Loss 4.9022 LearningRate 0.0086 Epoch: 28 Global Step: 160860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:44,344-Speed 10544.90 samples/sec Loss 5.0389 LearningRate 0.0086 Epoch: 28 Global Step: 160870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:45,229-Speed 11577.95 samples/sec Loss 4.9393 LearningRate 0.0086 Epoch: 28 Global Step: 160880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:46,113-Speed 11593.63 samples/sec Loss 4.9186 LearningRate 0.0086 Epoch: 28 Global Step: 160890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:47,053-Speed 10911.80 samples/sec Loss 5.0875 LearningRate 0.0086 Epoch: 28 Global Step: 160900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:56:47,974-Speed 11129.83 samples/sec Loss 5.0196 LearningRate 0.0086 Epoch: 28 Global Step: 160910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:48,887-Speed 11220.28 samples/sec Loss 5.0743 LearningRate 0.0086 Epoch: 28 Global Step: 160920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:49,844-Speed 10709.87 samples/sec Loss 5.0391 LearningRate 0.0086 Epoch: 28 Global Step: 160930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:50,790-Speed 10832.30 samples/sec Loss 4.9620 LearningRate 0.0085 Epoch: 28 Global Step: 160940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:51,727-Speed 10930.58 samples/sec Loss 4.9910 LearningRate 0.0085 Epoch: 28 Global Step: 160950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:52,652-Speed 11093.36 samples/sec Loss 4.9152 LearningRate 0.0085 Epoch: 28 Global Step: 160960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:53,572-Speed 11146.35 samples/sec Loss 4.8677 LearningRate 0.0085 Epoch: 28 Global Step: 160970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:54,525-Speed 10745.66 samples/sec Loss 4.9591 LearningRate 0.0085 Epoch: 28 Global Step: 160980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:55,523-Speed 10275.50 samples/sec Loss 4.9932 LearningRate 0.0085 Epoch: 28 Global Step: 160990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:56,455-Speed 10996.50 samples/sec Loss 5.0097 LearningRate 0.0085 Epoch: 28 Global Step: 161000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:56:57,395-Speed 10903.69 samples/sec Loss 5.0468 LearningRate 0.0085 Epoch: 28 Global Step: 161010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:56:58,354-Speed 10680.57 samples/sec Loss 5.0335 LearningRate 0.0085 Epoch: 28 Global Step: 161020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:56:59,308-Speed 10736.25 samples/sec Loss 5.0035 LearningRate 0.0085 Epoch: 28 Global Step: 161030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:00,222-Speed 11211.01 samples/sec Loss 5.0185 LearningRate 0.0085 Epoch: 28 Global Step: 161040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:01,181-Speed 10689.97 samples/sec Loss 4.9610 LearningRate 0.0085 Epoch: 28 Global Step: 161050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:02,113-Speed 10989.27 samples/sec Loss 5.0348 LearningRate 0.0085 Epoch: 28 Global Step: 161060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:03,060-Speed 10820.26 samples/sec Loss 5.0129 LearningRate 0.0085 Epoch: 28 Global Step: 161070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:04,006-Speed 10837.98 samples/sec Loss 4.8550 LearningRate 0.0085 Epoch: 28 Global Step: 161080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:04,925-Speed 11149.23 samples/sec Loss 5.1002 LearningRate 0.0085 Epoch: 28 Global Step: 161090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:05,901-Speed 10501.07 samples/sec Loss 5.0645 LearningRate 0.0085 Epoch: 28 Global Step: 161100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:06,843-Speed 10873.77 samples/sec Loss 4.8816 LearningRate 0.0085 Epoch: 28 Global Step: 161110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:07,785-Speed 10878.41 samples/sec Loss 5.0868 LearningRate 0.0085 Epoch: 28 Global Step: 161120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:08,721-Speed 10954.08 samples/sec Loss 5.0190 LearningRate 0.0085 Epoch: 28 Global Step: 161130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:09,632-Speed 11248.12 samples/sec Loss 5.0032 LearningRate 0.0085 Epoch: 28 Global Step: 161140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:10,510-Speed 11676.78 samples/sec Loss 4.9672 LearningRate 0.0085 Epoch: 28 Global Step: 161150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:11,414-Speed 11331.65 samples/sec Loss 5.0488 LearningRate 0.0085 Epoch: 28 Global Step: 161160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:12,354-Speed 10903.20 samples/sec Loss 5.1109 LearningRate 0.0085 Epoch: 28 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:13,316-Speed 10649.61 samples/sec Loss 4.9692 LearningRate 0.0085 Epoch: 28 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:14,290-Speed 10519.76 samples/sec Loss 5.0242 LearningRate 0.0085 Epoch: 28 Global Step: 161190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:15,219-Speed 11028.15 samples/sec Loss 5.0594 LearningRate 0.0085 Epoch: 28 Global Step: 161200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:16,154-Speed 10961.13 samples/sec Loss 4.9610 LearningRate 0.0085 Epoch: 28 Global Step: 161210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:17,095-Speed 10887.37 samples/sec Loss 5.0045 LearningRate 0.0085 Epoch: 28 Global Step: 161220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:18,043-Speed 10813.84 samples/sec Loss 5.0512 LearningRate 0.0085 Epoch: 28 Global Step: 161230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:18,992-Speed 10798.03 samples/sec Loss 5.0242 LearningRate 0.0085 Epoch: 28 Global Step: 161240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:19,939-Speed 10818.22 samples/sec Loss 4.9803 LearningRate 0.0085 Epoch: 28 Global Step: 161250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:20,827-Speed 11533.11 samples/sec Loss 5.0122 LearningRate 0.0085 Epoch: 28 Global Step: 161260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:21,725-Speed 11413.98 samples/sec Loss 5.0780 LearningRate 0.0085 Epoch: 28 Global Step: 161270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:22,635-Speed 11267.14 samples/sec Loss 5.0018 LearningRate 0.0085 Epoch: 28 Global Step: 161280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:23,564-Speed 11021.50 samples/sec Loss 5.0074 LearningRate 0.0085 Epoch: 28 Global Step: 161290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:24,519-Speed 10732.25 samples/sec Loss 5.0851 LearningRate 0.0085 Epoch: 28 Global Step: 161300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:25,452-Speed 10977.01 samples/sec Loss 4.9694 LearningRate 0.0085 Epoch: 28 Global Step: 161310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:26,405-Speed 10752.96 samples/sec Loss 4.9377 LearningRate 0.0085 Epoch: 28 Global Step: 161320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:27,332-Speed 11058.90 samples/sec Loss 5.0727 LearningRate 0.0084 Epoch: 28 Global Step: 161330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:28,292-Speed 10675.41 samples/sec Loss 4.9790 LearningRate 0.0084 Epoch: 28 Global Step: 161340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:29,290-Speed 10265.30 samples/sec Loss 5.0683 LearningRate 0.0084 Epoch: 28 Global Step: 161350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:30,210-Speed 11142.62 samples/sec Loss 4.9880 LearningRate 0.0084 Epoch: 28 Global Step: 161360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:31,162-Speed 10767.99 samples/sec Loss 5.0969 LearningRate 0.0084 Epoch: 28 Global Step: 161370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:57:32,059-Speed 11424.08 samples/sec Loss 5.1811 LearningRate 0.0084 Epoch: 28 Global Step: 161380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:32,961-Speed 11365.37 samples/sec Loss 5.0631 LearningRate 0.0084 Epoch: 28 Global Step: 161390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:33,870-Speed 11269.82 samples/sec Loss 4.9265 LearningRate 0.0084 Epoch: 28 Global Step: 161400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:34,791-Speed 11127.73 samples/sec Loss 4.9119 LearningRate 0.0084 Epoch: 28 Global Step: 161410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:35,719-Speed 11043.05 samples/sec Loss 5.0685 LearningRate 0.0084 Epoch: 28 Global Step: 161420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:36,660-Speed 10884.67 samples/sec Loss 5.0099 LearningRate 0.0084 Epoch: 28 Global Step: 161430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:37,588-Speed 11043.35 samples/sec Loss 4.9926 LearningRate 0.0084 Epoch: 28 Global Step: 161440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:38,570-Speed 10435.79 samples/sec Loss 5.0414 LearningRate 0.0084 Epoch: 28 Global Step: 161450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:39,580-Speed 10136.72 samples/sec Loss 5.0253 LearningRate 0.0084 Epoch: 28 Global Step: 161460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:40,521-Speed 10890.94 samples/sec Loss 4.9949 LearningRate 0.0084 Epoch: 28 Global Step: 161470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:41,474-Speed 10754.71 samples/sec Loss 5.1861 LearningRate 0.0084 Epoch: 28 Global Step: 161480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:42,442-Speed 10583.08 samples/sec Loss 5.1132 LearningRate 0.0084 Epoch: 28 Global Step: 161490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:43,398-Speed 10715.11 samples/sec Loss 5.0351 LearningRate 0.0084 Epoch: 28 Global Step: 161500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:44,312-Speed 11218.37 samples/sec Loss 4.9645 LearningRate 0.0084 Epoch: 28 Global Step: 161510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:45,220-Speed 11288.19 samples/sec Loss 5.0492 LearningRate 0.0084 Epoch: 28 Global Step: 161520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:46,112-Speed 11488.82 samples/sec Loss 5.0130 LearningRate 0.0084 Epoch: 28 Global Step: 161530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:47,008-Speed 11426.74 samples/sec Loss 5.1654 LearningRate 0.0084 Epoch: 28 Global Step: 161540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:47,926-Speed 11169.41 samples/sec Loss 5.0842 LearningRate 0.0084 Epoch: 28 Global Step: 161550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:48,878-Speed 10758.88 samples/sec Loss 5.0256 LearningRate 0.0084 Epoch: 28 Global Step: 161560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:49,816-Speed 10929.12 samples/sec Loss 5.1193 LearningRate 0.0084 Epoch: 28 Global Step: 161570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:50,740-Speed 11087.04 samples/sec Loss 5.1974 LearningRate 0.0084 Epoch: 28 Global Step: 161580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:57:51,677-Speed 10941.09 samples/sec Loss 5.0085 LearningRate 0.0084 Epoch: 28 Global Step: 161590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:57:52,613-Speed 10947.77 samples/sec Loss 4.8044 LearningRate 0.0084 Epoch: 28 Global Step: 161600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:53,536-Speed 11104.93 samples/sec Loss 4.9176 LearningRate 0.0084 Epoch: 28 Global Step: 161610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:54,480-Speed 10847.38 samples/sec Loss 5.1048 LearningRate 0.0084 Epoch: 28 Global Step: 161620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:55,411-Speed 11006.06 samples/sec Loss 5.1256 LearningRate 0.0084 Epoch: 28 Global Step: 161630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:56,354-Speed 10871.15 samples/sec Loss 5.0600 LearningRate 0.0084 Epoch: 28 Global Step: 161640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:57,284-Speed 11020.29 samples/sec Loss 5.0282 LearningRate 0.0084 Epoch: 28 Global Step: 161650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:58,215-Speed 11000.45 samples/sec Loss 5.0188 LearningRate 0.0084 Epoch: 28 Global Step: 161660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:57:59,126-Speed 11247.45 samples/sec Loss 5.0539 LearningRate 0.0084 Epoch: 28 Global Step: 161670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:00,071-Speed 10847.10 samples/sec Loss 5.0961 LearningRate 0.0084 Epoch: 28 Global Step: 161680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:00,997-Speed 11066.30 samples/sec Loss 5.0712 LearningRate 0.0084 Epoch: 28 Global Step: 161690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:01,932-Speed 10955.64 samples/sec Loss 5.1531 LearningRate 0.0084 Epoch: 28 Global Step: 161700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:02,854-Speed 11118.53 samples/sec Loss 4.9847 LearningRate 0.0084 Epoch: 28 Global Step: 161710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:03,809-Speed 10726.84 samples/sec Loss 5.0340 LearningRate 0.0083 Epoch: 28 Global Step: 161720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:04,737-Speed 11038.39 samples/sec Loss 5.0046 LearningRate 0.0083 Epoch: 28 Global Step: 161730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:05,687-Speed 10788.26 samples/sec Loss 5.1014 LearningRate 0.0083 Epoch: 28 Global Step: 161740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:06,587-Speed 11388.32 samples/sec Loss 5.0543 LearningRate 0.0083 Epoch: 28 Global Step: 161750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:07,478-Speed 11493.97 samples/sec Loss 4.9980 LearningRate 0.0083 Epoch: 28 Global Step: 161760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:08,413-Speed 10965.78 samples/sec Loss 4.9645 LearningRate 0.0083 Epoch: 28 Global Step: 161770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:09,321-Speed 11282.31 samples/sec Loss 5.0858 LearningRate 0.0083 Epoch: 28 Global Step: 161780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:10,234-Speed 11217.24 samples/sec Loss 5.1381 LearningRate 0.0083 Epoch: 28 Global Step: 161790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:11,164-Speed 11029.03 samples/sec Loss 5.1709 LearningRate 0.0083 Epoch: 28 Global Step: 161800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:58:12,075-Speed 11246.61 samples/sec Loss 4.9720 LearningRate 0.0083 Epoch: 28 Global Step: 161810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:13,039-Speed 10635.60 samples/sec Loss 4.9365 LearningRate 0.0083 Epoch: 28 Global Step: 161820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:14,016-Speed 10479.56 samples/sec Loss 5.1102 LearningRate 0.0083 Epoch: 28 Global Step: 161830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:14,945-Speed 11038.62 samples/sec Loss 4.9786 LearningRate 0.0083 Epoch: 28 Global Step: 161840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:15,869-Speed 11090.75 samples/sec Loss 5.1759 LearningRate 0.0083 Epoch: 28 Global Step: 161850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:16,809-Speed 10896.55 samples/sec Loss 5.0759 LearningRate 0.0083 Epoch: 28 Global Step: 161860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:17,693-Speed 11595.44 samples/sec Loss 5.1314 LearningRate 0.0083 Epoch: 28 Global Step: 161870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:18,663-Speed 10559.24 samples/sec Loss 5.0473 LearningRate 0.0083 Epoch: 28 Global Step: 161880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:19,625-Speed 10653.72 samples/sec Loss 5.0527 LearningRate 0.0083 Epoch: 28 Global Step: 161890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:20,566-Speed 10884.52 samples/sec Loss 5.0568 LearningRate 0.0083 Epoch: 28 Global Step: 161900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:21,475-Speed 11275.75 samples/sec Loss 5.1816 LearningRate 0.0083 Epoch: 28 Global Step: 161910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:58:22,406-Speed 11013.33 samples/sec Loss 5.0296 LearningRate 0.0083 Epoch: 28 Global Step: 161920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:23,334-Speed 11037.69 samples/sec Loss 4.9808 LearningRate 0.0083 Epoch: 28 Global Step: 161930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:24,261-Speed 11051.31 samples/sec Loss 5.0764 LearningRate 0.0083 Epoch: 28 Global Step: 161940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:25,180-Speed 11163.62 samples/sec Loss 5.2037 LearningRate 0.0083 Epoch: 28 Global Step: 161950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:26,125-Speed 10840.01 samples/sec Loss 5.0128 LearningRate 0.0083 Epoch: 28 Global Step: 161960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:27,046-Speed 11130.62 samples/sec Loss 5.1983 LearningRate 0.0083 Epoch: 28 Global Step: 161970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:28,007-Speed 10662.44 samples/sec Loss 5.0394 LearningRate 0.0083 Epoch: 28 Global Step: 161980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:28,950-Speed 10866.00 samples/sec Loss 5.0937 LearningRate 0.0083 Epoch: 28 Global Step: 161990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:29,886-Speed 10941.67 samples/sec Loss 5.0232 LearningRate 0.0083 Epoch: 28 Global Step: 162000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:58:51,148-[lfw][162000]XNorm: 8.551392 Training: 2022-04-11 04:58:51,149-[lfw][162000]Accuracy-Flip: 0.99700+-0.00323 Training: 2022-04-11 04:58:51,149-[lfw][162000]Accuracy-Highest: 0.99733 Training: 2022-04-11 04:59:15,535-[cfp_fp][162000]XNorm: 7.266270 Training: 2022-04-11 04:59:15,535-[cfp_fp][162000]Accuracy-Flip: 0.95029+-0.01018 Training: 2022-04-11 04:59:15,536-[cfp_fp][162000]Accuracy-Highest: 0.95429 Training: 2022-04-11 04:59:36,810-[agedb_30][162000]XNorm: 8.394651 Training: 2022-04-11 04:59:36,811-[agedb_30][162000]Accuracy-Flip: 0.96683+-0.00808 Training: 2022-04-11 04:59:36,811-[agedb_30][162000]Accuracy-Highest: 0.96900 Training: 2022-04-11 04:59:37,744-Speed 150.91 samples/sec Loss 5.0934 LearningRate 0.0083 Epoch: 28 Global Step: 162010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:59:38,618-Speed 11723.22 samples/sec Loss 5.0177 LearningRate 0.0083 Epoch: 28 Global Step: 162020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:39,560-Speed 10886.62 samples/sec Loss 5.0733 LearningRate 0.0083 Epoch: 28 Global Step: 162030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:40,529-Speed 10574.59 samples/sec Loss 5.0670 LearningRate 0.0083 Epoch: 28 Global Step: 162040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:41,502-Speed 10529.97 samples/sec Loss 4.9832 LearningRate 0.0083 Epoch: 28 Global Step: 162050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:42,437-Speed 10960.69 samples/sec Loss 5.0136 LearningRate 0.0083 Epoch: 28 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:43,367-Speed 11014.83 samples/sec Loss 4.9576 LearningRate 0.0083 Epoch: 28 Global Step: 162070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:44,320-Speed 10752.08 samples/sec Loss 5.1423 LearningRate 0.0083 Epoch: 28 Global Step: 162080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:45,206-Speed 11570.80 samples/sec Loss 4.9656 LearningRate 0.0083 Epoch: 28 Global Step: 162090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:46,100-Speed 11468.94 samples/sec Loss 5.0496 LearningRate 0.0083 Epoch: 28 Global Step: 162100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:46,986-Speed 11566.22 samples/sec Loss 5.1153 LearningRate 0.0083 Epoch: 28 Global Step: 162110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:47,878-Speed 11480.70 samples/sec Loss 5.0143 LearningRate 0.0082 Epoch: 28 Global Step: 162120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 04:59:48,810-Speed 11000.11 samples/sec Loss 5.1459 LearningRate 0.0082 Epoch: 28 Global Step: 162130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:49,724-Speed 11210.09 samples/sec Loss 5.1586 LearningRate 0.0082 Epoch: 28 Global Step: 162140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:50,654-Speed 11018.16 samples/sec Loss 5.0726 LearningRate 0.0082 Epoch: 28 Global Step: 162150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:51,594-Speed 10896.19 samples/sec Loss 5.1400 LearningRate 0.0082 Epoch: 28 Global Step: 162160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:52,506-Speed 11240.27 samples/sec Loss 4.9867 LearningRate 0.0082 Epoch: 28 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:53,410-Speed 11330.21 samples/sec Loss 4.8926 LearningRate 0.0082 Epoch: 28 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:54,352-Speed 10884.89 samples/sec Loss 5.0209 LearningRate 0.0082 Epoch: 28 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:55,271-Speed 11148.51 samples/sec Loss 5.1332 LearningRate 0.0082 Epoch: 28 Global Step: 162200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:56,251-Speed 10457.17 samples/sec Loss 5.1671 LearningRate 0.0082 Epoch: 28 Global Step: 162210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:57,157-Speed 11310.36 samples/sec Loss 4.9072 LearningRate 0.0082 Epoch: 28 Global Step: 162220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 04:59:58,038-Speed 11624.42 samples/sec Loss 5.2070 LearningRate 0.0082 Epoch: 28 Global Step: 162230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:59:58,943-Speed 11324.59 samples/sec Loss 5.0572 LearningRate 0.0082 Epoch: 28 Global Step: 162240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 04:59:59,836-Speed 11476.65 samples/sec Loss 4.9803 LearningRate 0.0082 Epoch: 28 Global Step: 162250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:00,756-Speed 11134.33 samples/sec Loss 5.0382 LearningRate 0.0082 Epoch: 28 Global Step: 162260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:01,670-Speed 11217.25 samples/sec Loss 5.0851 LearningRate 0.0082 Epoch: 28 Global Step: 162270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:02,588-Speed 11164.17 samples/sec Loss 5.0972 LearningRate 0.0082 Epoch: 28 Global Step: 162280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:03,499-Speed 11239.30 samples/sec Loss 4.9545 LearningRate 0.0082 Epoch: 28 Global Step: 162290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:04,418-Speed 11153.02 samples/sec Loss 5.0945 LearningRate 0.0082 Epoch: 28 Global Step: 162300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:05,344-Speed 11069.10 samples/sec Loss 5.0949 LearningRate 0.0082 Epoch: 28 Global Step: 162310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:06,291-Speed 10824.56 samples/sec Loss 5.0157 LearningRate 0.0082 Epoch: 28 Global Step: 162320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:00:07,242-Speed 10769.02 samples/sec Loss 5.0972 LearningRate 0.0082 Epoch: 28 Global Step: 162330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:08,163-Speed 11127.37 samples/sec Loss 5.0832 LearningRate 0.0082 Epoch: 28 Global Step: 162340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:09,086-Speed 11100.88 samples/sec Loss 5.1162 LearningRate 0.0082 Epoch: 28 Global Step: 162350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:09,973-Speed 11549.63 samples/sec Loss 5.0083 LearningRate 0.0082 Epoch: 28 Global Step: 162360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:10,861-Speed 11540.14 samples/sec Loss 5.1154 LearningRate 0.0082 Epoch: 28 Global Step: 162370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:11,814-Speed 10751.48 samples/sec Loss 5.1358 LearningRate 0.0082 Epoch: 28 Global Step: 162380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:12,782-Speed 10584.72 samples/sec Loss 5.0999 LearningRate 0.0082 Epoch: 28 Global Step: 162390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:13,744-Speed 10658.49 samples/sec Loss 5.0496 LearningRate 0.0082 Epoch: 28 Global Step: 162400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:14,730-Speed 10387.53 samples/sec Loss 5.0988 LearningRate 0.0082 Epoch: 28 Global Step: 162410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:15,647-Speed 11177.64 samples/sec Loss 5.2091 LearningRate 0.0082 Epoch: 28 Global Step: 162420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:16,586-Speed 10905.28 samples/sec Loss 4.9233 LearningRate 0.0082 Epoch: 28 Global Step: 162430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:00:17,530-Speed 10859.07 samples/sec Loss 5.0981 LearningRate 0.0082 Epoch: 28 Global Step: 162440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:00:18,480-Speed 10791.12 samples/sec Loss 5.3043 LearningRate 0.0082 Epoch: 28 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:19,394-Speed 11211.34 samples/sec Loss 5.0545 LearningRate 0.0082 Epoch: 28 Global Step: 162460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:20,348-Speed 10744.53 samples/sec Loss 4.9999 LearningRate 0.0082 Epoch: 28 Global Step: 162470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:21,258-Speed 11254.83 samples/sec Loss 5.1458 LearningRate 0.0082 Epoch: 28 Global Step: 162480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:22,161-Speed 11343.91 samples/sec Loss 5.0353 LearningRate 0.0082 Epoch: 28 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:23,111-Speed 10785.64 samples/sec Loss 5.0874 LearningRate 0.0082 Epoch: 28 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:24,069-Speed 10704.33 samples/sec Loss 5.1616 LearningRate 0.0081 Epoch: 28 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:25,007-Speed 10922.74 samples/sec Loss 5.1032 LearningRate 0.0081 Epoch: 28 Global Step: 162520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:25,945-Speed 10921.69 samples/sec Loss 5.2023 LearningRate 0.0081 Epoch: 28 Global Step: 162530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:26,883-Speed 10920.91 samples/sec Loss 5.1267 LearningRate 0.0081 Epoch: 28 Global Step: 162540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:27,827-Speed 10858.02 samples/sec Loss 5.1889 LearningRate 0.0081 Epoch: 28 Global Step: 162550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:28,777-Speed 10790.07 samples/sec Loss 5.1016 LearningRate 0.0081 Epoch: 28 Global Step: 162560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:29,727-Speed 10787.73 samples/sec Loss 5.0496 LearningRate 0.0081 Epoch: 28 Global Step: 162570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:30,668-Speed 10886.80 samples/sec Loss 5.0995 LearningRate 0.0081 Epoch: 28 Global Step: 162580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:31,574-Speed 11308.34 samples/sec Loss 5.1087 LearningRate 0.0081 Epoch: 28 Global Step: 162590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:32,537-Speed 10637.47 samples/sec Loss 5.1034 LearningRate 0.0081 Epoch: 28 Global Step: 162600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:33,466-Speed 11038.77 samples/sec Loss 5.1343 LearningRate 0.0081 Epoch: 28 Global Step: 162610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:34,369-Speed 11340.71 samples/sec Loss 5.1575 LearningRate 0.0081 Epoch: 28 Global Step: 162620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:35,292-Speed 11104.98 samples/sec Loss 5.0408 LearningRate 0.0081 Epoch: 28 Global Step: 162630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:36,250-Speed 10697.46 samples/sec Loss 5.1966 LearningRate 0.0081 Epoch: 28 Global Step: 162640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:37,203-Speed 10748.73 samples/sec Loss 5.1271 LearningRate 0.0081 Epoch: 28 Global Step: 162650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:00:38,136-Speed 10991.94 samples/sec Loss 5.1246 LearningRate 0.0081 Epoch: 28 Global Step: 162660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:39,060-Speed 11089.22 samples/sec Loss 4.9917 LearningRate 0.0081 Epoch: 28 Global Step: 162670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:40,008-Speed 10809.65 samples/sec Loss 5.2054 LearningRate 0.0081 Epoch: 28 Global Step: 162680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:40,942-Speed 10971.94 samples/sec Loss 5.0430 LearningRate 0.0081 Epoch: 28 Global Step: 162690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:41,906-Speed 10636.60 samples/sec Loss 5.1082 LearningRate 0.0081 Epoch: 28 Global Step: 162700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:42,848-Speed 10879.42 samples/sec Loss 5.0931 LearningRate 0.0081 Epoch: 28 Global Step: 162710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:43,776-Speed 11039.16 samples/sec Loss 5.0359 LearningRate 0.0081 Epoch: 28 Global Step: 162720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:44,744-Speed 10583.66 samples/sec Loss 5.1188 LearningRate 0.0081 Epoch: 28 Global Step: 162730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:45,653-Speed 11273.94 samples/sec Loss 5.1814 LearningRate 0.0081 Epoch: 28 Global Step: 162740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:46,558-Speed 11321.09 samples/sec Loss 5.0613 LearningRate 0.0081 Epoch: 28 Global Step: 162750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:47,463-Speed 11323.29 samples/sec Loss 5.1361 LearningRate 0.0081 Epoch: 28 Global Step: 162760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:48,440-Speed 10489.17 samples/sec Loss 5.2212 LearningRate 0.0081 Epoch: 28 Global Step: 162770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:49,393-Speed 10756.76 samples/sec Loss 5.0400 LearningRate 0.0081 Epoch: 28 Global Step: 162780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:50,346-Speed 10743.13 samples/sec Loss 5.0948 LearningRate 0.0081 Epoch: 28 Global Step: 162790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:51,263-Speed 11182.72 samples/sec Loss 5.0471 LearningRate 0.0081 Epoch: 28 Global Step: 162800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:52,184-Speed 11125.69 samples/sec Loss 5.0318 LearningRate 0.0081 Epoch: 28 Global Step: 162810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:53,110-Speed 11062.93 samples/sec Loss 4.8967 LearningRate 0.0081 Epoch: 28 Global Step: 162820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:54,004-Speed 11457.49 samples/sec Loss 5.1682 LearningRate 0.0081 Epoch: 28 Global Step: 162830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:54,967-Speed 10646.25 samples/sec Loss 5.2086 LearningRate 0.0081 Epoch: 28 Global Step: 162840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:55,914-Speed 10816.85 samples/sec Loss 4.9830 LearningRate 0.0081 Epoch: 28 Global Step: 162850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:56,869-Speed 10732.29 samples/sec Loss 5.1583 LearningRate 0.0081 Epoch: 28 Global Step: 162860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:00:57,787-Speed 11160.40 samples/sec Loss 5.2309 LearningRate 0.0081 Epoch: 28 Global Step: 162870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:58,765-Speed 10474.23 samples/sec Loss 5.1166 LearningRate 0.0081 Epoch: 28 Global Step: 162880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:00:59,696-Speed 11014.60 samples/sec Loss 5.0320 LearningRate 0.0081 Epoch: 28 Global Step: 162890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:00,656-Speed 10675.23 samples/sec Loss 5.1500 LearningRate 0.0081 Epoch: 28 Global Step: 162900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:01,583-Speed 11052.31 samples/sec Loss 5.0128 LearningRate 0.0080 Epoch: 28 Global Step: 162910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:02,514-Speed 11005.90 samples/sec Loss 5.0071 LearningRate 0.0080 Epoch: 28 Global Step: 162920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:03,433-Speed 11155.23 samples/sec Loss 4.9852 LearningRate 0.0080 Epoch: 28 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:04,367-Speed 10975.35 samples/sec Loss 5.1787 LearningRate 0.0080 Epoch: 28 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:05,290-Speed 11093.99 samples/sec Loss 5.0884 LearningRate 0.0080 Epoch: 28 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:06,241-Speed 10776.84 samples/sec Loss 5.2098 LearningRate 0.0080 Epoch: 28 Global Step: 162960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:07,189-Speed 10811.66 samples/sec Loss 5.1863 LearningRate 0.0080 Epoch: 28 Global Step: 162970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:08,142-Speed 10749.90 samples/sec Loss 5.0818 LearningRate 0.0080 Epoch: 28 Global Step: 162980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:09,064-Speed 11107.09 samples/sec Loss 5.0245 LearningRate 0.0080 Epoch: 28 Global Step: 162990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:09,985-Speed 11134.91 samples/sec Loss 5.0163 LearningRate 0.0080 Epoch: 28 Global Step: 163000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:10,891-Speed 11310.79 samples/sec Loss 5.1355 LearningRate 0.0080 Epoch: 28 Global Step: 163010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:11,801-Speed 11253.44 samples/sec Loss 5.0577 LearningRate 0.0080 Epoch: 28 Global Step: 163020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:12,739-Speed 10919.82 samples/sec Loss 5.1039 LearningRate 0.0080 Epoch: 28 Global Step: 163030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:13,674-Speed 10967.43 samples/sec Loss 5.2036 LearningRate 0.0080 Epoch: 28 Global Step: 163040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:14,607-Speed 10983.86 samples/sec Loss 5.0904 LearningRate 0.0080 Epoch: 28 Global Step: 163050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:15,516-Speed 11269.58 samples/sec Loss 5.2045 LearningRate 0.0080 Epoch: 28 Global Step: 163060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:16,433-Speed 11174.90 samples/sec Loss 5.0480 LearningRate 0.0080 Epoch: 28 Global Step: 163070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:17,395-Speed 10649.15 samples/sec Loss 5.1598 LearningRate 0.0080 Epoch: 28 Global Step: 163080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:18,363-Speed 10582.03 samples/sec Loss 4.9997 LearningRate 0.0080 Epoch: 28 Global Step: 163090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:01:19,292-Speed 11038.47 samples/sec Loss 5.1807 LearningRate 0.0080 Epoch: 28 Global Step: 163100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:01:20,227-Speed 10953.30 samples/sec Loss 5.1574 LearningRate 0.0080 Epoch: 28 Global Step: 163110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:21,128-Speed 11370.41 samples/sec Loss 5.1211 LearningRate 0.0080 Epoch: 28 Global Step: 163120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:22,024-Speed 11442.06 samples/sec Loss 5.0685 LearningRate 0.0080 Epoch: 28 Global Step: 163130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:22,926-Speed 11359.85 samples/sec Loss 5.2192 LearningRate 0.0080 Epoch: 28 Global Step: 163140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:23,888-Speed 10653.65 samples/sec Loss 5.0251 LearningRate 0.0080 Epoch: 28 Global Step: 163150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:24,780-Speed 11487.23 samples/sec Loss 5.1161 LearningRate 0.0080 Epoch: 28 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:25,712-Speed 10995.41 samples/sec Loss 5.0746 LearningRate 0.0080 Epoch: 28 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:26,679-Speed 10594.41 samples/sec Loss 5.1498 LearningRate 0.0080 Epoch: 28 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:27,646-Speed 10600.88 samples/sec Loss 5.0467 LearningRate 0.0080 Epoch: 28 Global Step: 163190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:28,631-Speed 10405.03 samples/sec Loss 5.1513 LearningRate 0.0080 Epoch: 28 Global Step: 163200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:29,586-Speed 10728.96 samples/sec Loss 5.1372 LearningRate 0.0080 Epoch: 28 Global Step: 163210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:30,498-Speed 11227.95 samples/sec Loss 5.0523 LearningRate 0.0080 Epoch: 28 Global Step: 163220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:31,463-Speed 10623.51 samples/sec Loss 5.0153 LearningRate 0.0080 Epoch: 28 Global Step: 163230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:32,386-Speed 11099.10 samples/sec Loss 5.1436 LearningRate 0.0080 Epoch: 28 Global Step: 163240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:33,286-Speed 11393.69 samples/sec Loss 5.1852 LearningRate 0.0080 Epoch: 28 Global Step: 163250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:01:34,188-Speed 11358.02 samples/sec Loss 5.0827 LearningRate 0.0080 Epoch: 28 Global Step: 163260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:35,088-Speed 11382.08 samples/sec Loss 5.0640 LearningRate 0.0080 Epoch: 28 Global Step: 163270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:36,064-Speed 10495.84 samples/sec Loss 5.0990 LearningRate 0.0080 Epoch: 28 Global Step: 163280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:36,972-Speed 11285.16 samples/sec Loss 5.0255 LearningRate 0.0080 Epoch: 28 Global Step: 163290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:37,875-Speed 11356.13 samples/sec Loss 5.1640 LearningRate 0.0080 Epoch: 28 Global Step: 163300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:38,796-Speed 11119.33 samples/sec Loss 5.0129 LearningRate 0.0080 Epoch: 28 Global Step: 163310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:39,800-Speed 10204.55 samples/sec Loss 5.2961 LearningRate 0.0079 Epoch: 28 Global Step: 163320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:40,725-Speed 11075.78 samples/sec Loss 5.2170 LearningRate 0.0079 Epoch: 28 Global Step: 163330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:41,664-Speed 10916.35 samples/sec Loss 5.1420 LearningRate 0.0079 Epoch: 28 Global Step: 163340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:42,571-Speed 11294.04 samples/sec Loss 5.1256 LearningRate 0.0079 Epoch: 28 Global Step: 163350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:43,507-Speed 10945.15 samples/sec Loss 5.1209 LearningRate 0.0079 Epoch: 28 Global Step: 163360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:44,403-Speed 11444.30 samples/sec Loss 5.1098 LearningRate 0.0079 Epoch: 28 Global Step: 163370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:45,313-Speed 11265.31 samples/sec Loss 5.1529 LearningRate 0.0079 Epoch: 28 Global Step: 163380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:46,221-Speed 11284.04 samples/sec Loss 5.0972 LearningRate 0.0079 Epoch: 28 Global Step: 163390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:47,126-Speed 11326.64 samples/sec Loss 5.1722 LearningRate 0.0079 Epoch: 28 Global Step: 163400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:48,074-Speed 10805.54 samples/sec Loss 5.0619 LearningRate 0.0079 Epoch: 28 Global Step: 163410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:49,020-Speed 10831.47 samples/sec Loss 5.0764 LearningRate 0.0079 Epoch: 28 Global Step: 163420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:49,970-Speed 10783.05 samples/sec Loss 5.1704 LearningRate 0.0079 Epoch: 28 Global Step: 163430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:50,911-Speed 10899.80 samples/sec Loss 5.1840 LearningRate 0.0079 Epoch: 28 Global Step: 163440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:51,854-Speed 10858.72 samples/sec Loss 5.1424 LearningRate 0.0079 Epoch: 28 Global Step: 163450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:52,774-Speed 11134.53 samples/sec Loss 4.9880 LearningRate 0.0079 Epoch: 28 Global Step: 163460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:01:53,724-Speed 10792.29 samples/sec Loss 5.1064 LearningRate 0.0079 Epoch: 28 Global Step: 163470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:54,687-Speed 10639.15 samples/sec Loss 5.1000 LearningRate 0.0079 Epoch: 28 Global Step: 163480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:55,638-Speed 10769.32 samples/sec Loss 5.0419 LearningRate 0.0079 Epoch: 28 Global Step: 163490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:56,590-Speed 10768.19 samples/sec Loss 5.0363 LearningRate 0.0079 Epoch: 28 Global Step: 163500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:57,522-Speed 10990.99 samples/sec Loss 5.0891 LearningRate 0.0079 Epoch: 28 Global Step: 163510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:58,446-Speed 11094.04 samples/sec Loss 5.0048 LearningRate 0.0079 Epoch: 28 Global Step: 163520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:01:59,367-Speed 11119.47 samples/sec Loss 5.0697 LearningRate 0.0079 Epoch: 28 Global Step: 163530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:00,275-Speed 11292.99 samples/sec Loss 5.1239 LearningRate 0.0079 Epoch: 28 Global Step: 163540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:01,201-Speed 11064.40 samples/sec Loss 5.0689 LearningRate 0.0079 Epoch: 28 Global Step: 163550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:02,134-Speed 10977.30 samples/sec Loss 5.1265 LearningRate 0.0079 Epoch: 28 Global Step: 163560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:03,078-Speed 10860.03 samples/sec Loss 5.1051 LearningRate 0.0079 Epoch: 28 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:04,048-Speed 10564.16 samples/sec Loss 5.3071 LearningRate 0.0079 Epoch: 28 Global Step: 163580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:04,952-Speed 11337.07 samples/sec Loss 5.1581 LearningRate 0.0079 Epoch: 28 Global Step: 163590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:05,862-Speed 11260.12 samples/sec Loss 5.1410 LearningRate 0.0079 Epoch: 28 Global Step: 163600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:06,811-Speed 10801.86 samples/sec Loss 5.2262 LearningRate 0.0079 Epoch: 28 Global Step: 163610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:07,813-Speed 10227.58 samples/sec Loss 5.2378 LearningRate 0.0079 Epoch: 28 Global Step: 163620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:08,745-Speed 10987.67 samples/sec Loss 5.2155 LearningRate 0.0079 Epoch: 28 Global Step: 163630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:09,643-Speed 11417.74 samples/sec Loss 5.0546 LearningRate 0.0079 Epoch: 28 Global Step: 163640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:10,530-Speed 11553.09 samples/sec Loss 5.2704 LearningRate 0.0079 Epoch: 28 Global Step: 163650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:11,455-Speed 11074.73 samples/sec Loss 5.2660 LearningRate 0.0079 Epoch: 28 Global Step: 163660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:12,381-Speed 11065.96 samples/sec Loss 5.1203 LearningRate 0.0079 Epoch: 28 Global Step: 163670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:13,290-Speed 11275.47 samples/sec Loss 5.2442 LearningRate 0.0079 Epoch: 28 Global Step: 163680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:14,209-Speed 11148.64 samples/sec Loss 5.0544 LearningRate 0.0079 Epoch: 28 Global Step: 163690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:15,163-Speed 10737.16 samples/sec Loss 5.1815 LearningRate 0.0079 Epoch: 28 Global Step: 163700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:16,113-Speed 10790.54 samples/sec Loss 5.1900 LearningRate 0.0079 Epoch: 28 Global Step: 163710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:17,089-Speed 10497.94 samples/sec Loss 5.1852 LearningRate 0.0078 Epoch: 28 Global Step: 163720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:18,029-Speed 10893.12 samples/sec Loss 5.1121 LearningRate 0.0078 Epoch: 28 Global Step: 163730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:18,962-Speed 10988.81 samples/sec Loss 5.0822 LearningRate 0.0078 Epoch: 28 Global Step: 163740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:19,892-Speed 11017.04 samples/sec Loss 5.1604 LearningRate 0.0078 Epoch: 28 Global Step: 163750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:20,798-Speed 11309.52 samples/sec Loss 5.1461 LearningRate 0.0078 Epoch: 28 Global Step: 163760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:21,718-Speed 11145.61 samples/sec Loss 5.2108 LearningRate 0.0078 Epoch: 28 Global Step: 163770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:22,622-Speed 11330.67 samples/sec Loss 5.3478 LearningRate 0.0078 Epoch: 28 Global Step: 163780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:23,587-Speed 10622.08 samples/sec Loss 5.2042 LearningRate 0.0078 Epoch: 28 Global Step: 163790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:24,517-Speed 11017.99 samples/sec Loss 5.0487 LearningRate 0.0078 Epoch: 28 Global Step: 163800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:02:25,427-Speed 11251.43 samples/sec Loss 5.0554 LearningRate 0.0078 Epoch: 28 Global Step: 163810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:26,352-Speed 11081.62 samples/sec Loss 5.1016 LearningRate 0.0078 Epoch: 28 Global Step: 163820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:27,311-Speed 10679.50 samples/sec Loss 4.9497 LearningRate 0.0078 Epoch: 28 Global Step: 163830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:28,260-Speed 10808.76 samples/sec Loss 5.1483 LearningRate 0.0078 Epoch: 28 Global Step: 163840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:29,251-Speed 10341.08 samples/sec Loss 5.0461 LearningRate 0.0078 Epoch: 28 Global Step: 163850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:30,173-Speed 11106.65 samples/sec Loss 5.1710 LearningRate 0.0078 Epoch: 28 Global Step: 163860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:31,111-Speed 10930.54 samples/sec Loss 5.2206 LearningRate 0.0078 Epoch: 28 Global Step: 163870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:32,061-Speed 10785.64 samples/sec Loss 5.1956 LearningRate 0.0078 Epoch: 28 Global Step: 163880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:32,965-Speed 11338.53 samples/sec Loss 5.0952 LearningRate 0.0078 Epoch: 28 Global Step: 163890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:33,877-Speed 11233.47 samples/sec Loss 5.2361 LearningRate 0.0078 Epoch: 28 Global Step: 163900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:34,794-Speed 11166.82 samples/sec Loss 5.2107 LearningRate 0.0078 Epoch: 28 Global Step: 163910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:02:35,729-Speed 10963.78 samples/sec Loss 5.1246 LearningRate 0.0078 Epoch: 28 Global Step: 163920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:02:36,647-Speed 11164.55 samples/sec Loss 5.1656 LearningRate 0.0078 Epoch: 28 Global Step: 163930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:37,574-Speed 11057.06 samples/sec Loss 5.0857 LearningRate 0.0078 Epoch: 28 Global Step: 163940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:38,504-Speed 11013.98 samples/sec Loss 5.1297 LearningRate 0.0078 Epoch: 28 Global Step: 163950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:39,449-Speed 10845.40 samples/sec Loss 5.1347 LearningRate 0.0078 Epoch: 28 Global Step: 163960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:40,391-Speed 10876.64 samples/sec Loss 5.0441 LearningRate 0.0078 Epoch: 28 Global Step: 163970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:41,364-Speed 10528.20 samples/sec Loss 5.0563 LearningRate 0.0078 Epoch: 28 Global Step: 163980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:42,271-Speed 11295.94 samples/sec Loss 5.2215 LearningRate 0.0078 Epoch: 28 Global Step: 163990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:02:43,192-Speed 11125.19 samples/sec Loss 5.1144 LearningRate 0.0078 Epoch: 28 Global Step: 164000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:03:04,477-[lfw][164000]XNorm: 8.553464 Training: 2022-04-11 05:03:04,477-[lfw][164000]Accuracy-Flip: 0.99617+-0.00350 Training: 2022-04-11 05:03:04,478-[lfw][164000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:03:29,011-[cfp_fp][164000]XNorm: 7.284404 Training: 2022-04-11 05:03:29,011-[cfp_fp][164000]Accuracy-Flip: 0.94986+-0.00889 Training: 2022-04-11 05:03:29,012-[cfp_fp][164000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:03:50,166-[agedb_30][164000]XNorm: 8.368594 Training: 2022-04-11 05:03:50,166-[agedb_30][164000]Accuracy-Flip: 0.96800+-0.00690 Training: 2022-04-11 05:03:50,166-[agedb_30][164000]Accuracy-Highest: 0.96900 Training: 2022-04-11 05:03:51,129-Speed 150.73 samples/sec Loss 5.0888 LearningRate 0.0078 Epoch: 28 Global Step: 164010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:03:52,063-Speed 10966.94 samples/sec Loss 5.1627 LearningRate 0.0078 Epoch: 28 Global Step: 164020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:52,974-Speed 11252.94 samples/sec Loss 4.9666 LearningRate 0.0078 Epoch: 28 Global Step: 164030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:53,933-Speed 10684.51 samples/sec Loss 5.0278 LearningRate 0.0078 Epoch: 28 Global Step: 164040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:54,896-Speed 10641.75 samples/sec Loss 5.1774 LearningRate 0.0078 Epoch: 28 Global Step: 164050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:55,848-Speed 10765.16 samples/sec Loss 5.1420 LearningRate 0.0078 Epoch: 28 Global Step: 164060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:56,750-Speed 11367.01 samples/sec Loss 5.0535 LearningRate 0.0078 Epoch: 28 Global Step: 164070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:57,646-Speed 11438.95 samples/sec Loss 5.1029 LearningRate 0.0078 Epoch: 28 Global Step: 164080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:58,564-Speed 11153.48 samples/sec Loss 5.1520 LearningRate 0.0078 Epoch: 28 Global Step: 164090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:03:59,503-Speed 10924.56 samples/sec Loss 5.1295 LearningRate 0.0078 Epoch: 28 Global Step: 164100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:00,490-Speed 10380.12 samples/sec Loss 5.2343 LearningRate 0.0078 Epoch: 28 Global Step: 164110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:01,447-Speed 10708.88 samples/sec Loss 5.1767 LearningRate 0.0078 Epoch: 28 Global Step: 164120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:02,411-Speed 10627.83 samples/sec Loss 5.1195 LearningRate 0.0077 Epoch: 28 Global Step: 164130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:03,340-Speed 11032.54 samples/sec Loss 5.0903 LearningRate 0.0077 Epoch: 28 Global Step: 164140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:04,273-Speed 10978.25 samples/sec Loss 5.0783 LearningRate 0.0077 Epoch: 28 Global Step: 164150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:05,238-Speed 10623.73 samples/sec Loss 5.2055 LearningRate 0.0077 Epoch: 28 Global Step: 164160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:06,148-Speed 11257.17 samples/sec Loss 5.1596 LearningRate 0.0077 Epoch: 28 Global Step: 164170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:07,084-Speed 10946.81 samples/sec Loss 5.2253 LearningRate 0.0077 Epoch: 28 Global Step: 164180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:08,040-Speed 10713.86 samples/sec Loss 5.0884 LearningRate 0.0077 Epoch: 28 Global Step: 164190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:08,947-Speed 11302.57 samples/sec Loss 5.0162 LearningRate 0.0077 Epoch: 28 Global Step: 164200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:09,850-Speed 11356.20 samples/sec Loss 5.2040 LearningRate 0.0077 Epoch: 28 Global Step: 164210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:10,756-Speed 11305.32 samples/sec Loss 5.1166 LearningRate 0.0077 Epoch: 28 Global Step: 164220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:11,707-Speed 10776.33 samples/sec Loss 4.9943 LearningRate 0.0077 Epoch: 28 Global Step: 164230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:12,685-Speed 10473.52 samples/sec Loss 5.2700 LearningRate 0.0077 Epoch: 28 Global Step: 164240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:13,637-Speed 10769.62 samples/sec Loss 4.9892 LearningRate 0.0077 Epoch: 28 Global Step: 164250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:14,581-Speed 10854.70 samples/sec Loss 5.1144 LearningRate 0.0077 Epoch: 28 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:15,536-Speed 10727.02 samples/sec Loss 5.2528 LearningRate 0.0077 Epoch: 28 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:16,534-Speed 10261.90 samples/sec Loss 5.2419 LearningRate 0.0077 Epoch: 28 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:17,463-Speed 11029.06 samples/sec Loss 5.1608 LearningRate 0.0077 Epoch: 28 Global Step: 164290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:18,411-Speed 10815.74 samples/sec Loss 5.1162 LearningRate 0.0077 Epoch: 28 Global Step: 164300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:19,353-Speed 10879.56 samples/sec Loss 5.1256 LearningRate 0.0077 Epoch: 28 Global Step: 164310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:20,271-Speed 11155.34 samples/sec Loss 5.0974 LearningRate 0.0077 Epoch: 28 Global Step: 164320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:21,178-Speed 11302.95 samples/sec Loss 5.2291 LearningRate 0.0077 Epoch: 28 Global Step: 164330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:22,085-Speed 11298.53 samples/sec Loss 4.9999 LearningRate 0.0077 Epoch: 28 Global Step: 164340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:22,987-Speed 11355.68 samples/sec Loss 5.2175 LearningRate 0.0077 Epoch: 28 Global Step: 164350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:23,877-Speed 11508.72 samples/sec Loss 5.1405 LearningRate 0.0077 Epoch: 28 Global Step: 164360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:24,812-Speed 10965.16 samples/sec Loss 5.1140 LearningRate 0.0077 Epoch: 28 Global Step: 164370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:25,745-Speed 10979.50 samples/sec Loss 5.1245 LearningRate 0.0077 Epoch: 28 Global Step: 164380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:26,675-Speed 11014.46 samples/sec Loss 5.2257 LearningRate 0.0077 Epoch: 28 Global Step: 164390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:27,641-Speed 10614.16 samples/sec Loss 5.1839 LearningRate 0.0077 Epoch: 28 Global Step: 164400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:28,614-Speed 10531.37 samples/sec Loss 5.3639 LearningRate 0.0077 Epoch: 28 Global Step: 164410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:29,582-Speed 10579.22 samples/sec Loss 5.1305 LearningRate 0.0077 Epoch: 28 Global Step: 164420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:30,537-Speed 10735.87 samples/sec Loss 5.1690 LearningRate 0.0077 Epoch: 28 Global Step: 164430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:31,477-Speed 10896.21 samples/sec Loss 5.0775 LearningRate 0.0077 Epoch: 28 Global Step: 164440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:32,405-Speed 11040.69 samples/sec Loss 5.2488 LearningRate 0.0077 Epoch: 28 Global Step: 164450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:33,320-Speed 11205.68 samples/sec Loss 5.0032 LearningRate 0.0077 Epoch: 28 Global Step: 164460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:34,217-Speed 11425.30 samples/sec Loss 5.0490 LearningRate 0.0077 Epoch: 28 Global Step: 164470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:35,137-Speed 11134.94 samples/sec Loss 5.1431 LearningRate 0.0077 Epoch: 28 Global Step: 164480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:36,079-Speed 10880.96 samples/sec Loss 5.1156 LearningRate 0.0077 Epoch: 28 Global Step: 164490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:36,984-Speed 11316.56 samples/sec Loss 5.1036 LearningRate 0.0077 Epoch: 28 Global Step: 164500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:37,953-Speed 10579.97 samples/sec Loss 5.2248 LearningRate 0.0077 Epoch: 28 Global Step: 164510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:38,906-Speed 10744.27 samples/sec Loss 5.1424 LearningRate 0.0077 Epoch: 28 Global Step: 164520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:39,836-Speed 11019.99 samples/sec Loss 5.2074 LearningRate 0.0077 Epoch: 28 Global Step: 164530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:40,819-Speed 10431.46 samples/sec Loss 5.1086 LearningRate 0.0076 Epoch: 28 Global Step: 164540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:41,765-Speed 10825.66 samples/sec Loss 5.0136 LearningRate 0.0076 Epoch: 28 Global Step: 164550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:42,728-Speed 10640.02 samples/sec Loss 5.0883 LearningRate 0.0076 Epoch: 28 Global Step: 164560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:43,618-Speed 11509.78 samples/sec Loss 5.0727 LearningRate 0.0076 Epoch: 28 Global Step: 164570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:44,525-Speed 11299.65 samples/sec Loss 5.1458 LearningRate 0.0076 Epoch: 28 Global Step: 164580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:45,444-Speed 11155.20 samples/sec Loss 5.2589 LearningRate 0.0076 Epoch: 28 Global Step: 164590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:46,352-Speed 11286.68 samples/sec Loss 5.2223 LearningRate 0.0076 Epoch: 28 Global Step: 164600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:47,270-Speed 11156.64 samples/sec Loss 5.1033 LearningRate 0.0076 Epoch: 28 Global Step: 164610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:48,212-Speed 10876.88 samples/sec Loss 5.2530 LearningRate 0.0076 Epoch: 28 Global Step: 164620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:49,142-Speed 11016.74 samples/sec Loss 5.1003 LearningRate 0.0076 Epoch: 28 Global Step: 164630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:50,116-Speed 10527.10 samples/sec Loss 5.0124 LearningRate 0.0076 Epoch: 28 Global Step: 164640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:51,040-Speed 11093.32 samples/sec Loss 5.0720 LearningRate 0.0076 Epoch: 28 Global Step: 164650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:04:51,968-Speed 11038.51 samples/sec Loss 5.0972 LearningRate 0.0076 Epoch: 28 Global Step: 164660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:52,953-Speed 10410.88 samples/sec Loss 5.3137 LearningRate 0.0076 Epoch: 28 Global Step: 164670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:53,911-Speed 10698.55 samples/sec Loss 5.1032 LearningRate 0.0076 Epoch: 28 Global Step: 164680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:54,823-Speed 11228.65 samples/sec Loss 5.1026 LearningRate 0.0076 Epoch: 28 Global Step: 164690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:55,772-Speed 10806.03 samples/sec Loss 5.3111 LearningRate 0.0076 Epoch: 28 Global Step: 164700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:56,676-Speed 11326.76 samples/sec Loss 5.2372 LearningRate 0.0076 Epoch: 28 Global Step: 164710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:57,648-Speed 10539.43 samples/sec Loss 5.1808 LearningRate 0.0076 Epoch: 28 Global Step: 164720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:58,581-Speed 10988.95 samples/sec Loss 5.1331 LearningRate 0.0076 Epoch: 28 Global Step: 164730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:04:59,513-Speed 10993.19 samples/sec Loss 5.0450 LearningRate 0.0076 Epoch: 28 Global Step: 164740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:00,447-Speed 10966.33 samples/sec Loss 5.1319 LearningRate 0.0076 Epoch: 28 Global Step: 164750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:01,413-Speed 10613.51 samples/sec Loss 5.2776 LearningRate 0.0076 Epoch: 28 Global Step: 164760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:02,368-Speed 10726.77 samples/sec Loss 5.1963 LearningRate 0.0076 Epoch: 28 Global Step: 164770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:03,295-Speed 11058.25 samples/sec Loss 5.1620 LearningRate 0.0076 Epoch: 28 Global Step: 164780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:04,267-Speed 10534.88 samples/sec Loss 5.1624 LearningRate 0.0076 Epoch: 28 Global Step: 164790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:05,180-Speed 11226.79 samples/sec Loss 5.1766 LearningRate 0.0076 Epoch: 28 Global Step: 164800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:06,110-Speed 11023.27 samples/sec Loss 5.0723 LearningRate 0.0076 Epoch: 28 Global Step: 164810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:07,069-Speed 10684.75 samples/sec Loss 5.1328 LearningRate 0.0076 Epoch: 28 Global Step: 164820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:08,042-Speed 10525.07 samples/sec Loss 5.1014 LearningRate 0.0076 Epoch: 28 Global Step: 164830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:08,968-Speed 11062.45 samples/sec Loss 5.1487 LearningRate 0.0076 Epoch: 28 Global Step: 164840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:09,870-Speed 11367.63 samples/sec Loss 5.1706 LearningRate 0.0076 Epoch: 28 Global Step: 164850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:10,811-Speed 10886.47 samples/sec Loss 5.1568 LearningRate 0.0076 Epoch: 28 Global Step: 164860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:11,758-Speed 10819.70 samples/sec Loss 5.0095 LearningRate 0.0076 Epoch: 28 Global Step: 164870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:12,704-Speed 10838.30 samples/sec Loss 5.1516 LearningRate 0.0076 Epoch: 28 Global Step: 164880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:13,767-Speed 9636.73 samples/sec Loss 5.1417 LearningRate 0.0076 Epoch: 28 Global Step: 164890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:24,329-Speed 969.57 samples/sec Loss 4.8330 LearningRate 0.0076 Epoch: 29 Global Step: 164900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:25,310-Speed 10458.35 samples/sec Loss 4.6233 LearningRate 0.0076 Epoch: 29 Global Step: 164910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:26,285-Speed 10501.10 samples/sec Loss 4.4921 LearningRate 0.0076 Epoch: 29 Global Step: 164920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:27,230-Speed 10849.11 samples/sec Loss 4.6307 LearningRate 0.0076 Epoch: 29 Global Step: 164930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:28,239-Speed 10157.06 samples/sec Loss 4.4916 LearningRate 0.0076 Epoch: 29 Global Step: 164940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:29,189-Speed 10786.88 samples/sec Loss 4.4627 LearningRate 0.0075 Epoch: 29 Global Step: 164950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:30,143-Speed 10737.85 samples/sec Loss 4.6121 LearningRate 0.0075 Epoch: 29 Global Step: 164960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:31,057-Speed 11210.05 samples/sec Loss 4.6396 LearningRate 0.0075 Epoch: 29 Global Step: 164970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:32,009-Speed 10775.69 samples/sec Loss 4.5583 LearningRate 0.0075 Epoch: 29 Global Step: 164980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:32,937-Speed 11036.35 samples/sec Loss 4.5228 LearningRate 0.0075 Epoch: 29 Global Step: 164990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:33,867-Speed 11023.72 samples/sec Loss 4.4913 LearningRate 0.0075 Epoch: 29 Global Step: 165000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:34,785-Speed 11170.35 samples/sec Loss 4.5028 LearningRate 0.0075 Epoch: 29 Global Step: 165010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:35,733-Speed 10810.82 samples/sec Loss 4.6181 LearningRate 0.0075 Epoch: 29 Global Step: 165020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:36,687-Speed 10762.62 samples/sec Loss 4.4718 LearningRate 0.0075 Epoch: 29 Global Step: 165030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:37,621-Speed 10967.95 samples/sec Loss 4.5472 LearningRate 0.0075 Epoch: 29 Global Step: 165040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:38,577-Speed 10725.33 samples/sec Loss 4.6642 LearningRate 0.0075 Epoch: 29 Global Step: 165050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:39,517-Speed 10901.24 samples/sec Loss 4.6803 LearningRate 0.0075 Epoch: 29 Global Step: 165060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:40,465-Speed 10801.64 samples/sec Loss 4.6457 LearningRate 0.0075 Epoch: 29 Global Step: 165070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:41,410-Speed 10844.38 samples/sec Loss 4.6912 LearningRate 0.0075 Epoch: 29 Global Step: 165080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:42,357-Speed 10827.20 samples/sec Loss 4.5296 LearningRate 0.0075 Epoch: 29 Global Step: 165090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:43,295-Speed 10919.82 samples/sec Loss 4.6778 LearningRate 0.0075 Epoch: 29 Global Step: 165100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:44,244-Speed 10801.45 samples/sec Loss 4.5497 LearningRate 0.0075 Epoch: 29 Global Step: 165110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:05:45,125-Speed 11640.48 samples/sec Loss 4.6197 LearningRate 0.0075 Epoch: 29 Global Step: 165120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:46,046-Speed 11124.23 samples/sec Loss 4.6239 LearningRate 0.0075 Epoch: 29 Global Step: 165130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:46,966-Speed 11132.49 samples/sec Loss 4.7397 LearningRate 0.0075 Epoch: 29 Global Step: 165140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:47,900-Speed 10968.93 samples/sec Loss 4.6100 LearningRate 0.0075 Epoch: 29 Global Step: 165150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:48,830-Speed 11022.09 samples/sec Loss 4.6012 LearningRate 0.0075 Epoch: 29 Global Step: 165160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:49,767-Speed 10937.19 samples/sec Loss 4.6678 LearningRate 0.0075 Epoch: 29 Global Step: 165170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:50,748-Speed 10448.60 samples/sec Loss 4.5528 LearningRate 0.0075 Epoch: 29 Global Step: 165180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:51,704-Speed 10720.67 samples/sec Loss 4.5705 LearningRate 0.0075 Epoch: 29 Global Step: 165190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:52,676-Speed 10535.64 samples/sec Loss 4.6099 LearningRate 0.0075 Epoch: 29 Global Step: 165200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:53,688-Speed 10138.61 samples/sec Loss 4.6149 LearningRate 0.0075 Epoch: 29 Global Step: 165210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:54,601-Speed 11228.03 samples/sec Loss 4.5606 LearningRate 0.0075 Epoch: 29 Global Step: 165220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:05:55,491-Speed 11507.93 samples/sec Loss 4.6973 LearningRate 0.0075 Epoch: 29 Global Step: 165230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:56,459-Speed 10587.38 samples/sec Loss 4.7389 LearningRate 0.0075 Epoch: 29 Global Step: 165240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:57,370-Speed 11244.24 samples/sec Loss 4.7479 LearningRate 0.0075 Epoch: 29 Global Step: 165250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:58,291-Speed 11137.40 samples/sec Loss 4.6683 LearningRate 0.0075 Epoch: 29 Global Step: 165260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:05:59,205-Speed 11206.71 samples/sec Loss 4.8216 LearningRate 0.0075 Epoch: 29 Global Step: 165270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:00,116-Speed 11249.81 samples/sec Loss 4.7881 LearningRate 0.0075 Epoch: 29 Global Step: 165280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:01,059-Speed 10863.35 samples/sec Loss 4.5880 LearningRate 0.0075 Epoch: 29 Global Step: 165290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:01,962-Speed 11344.70 samples/sec Loss 4.5514 LearningRate 0.0075 Epoch: 29 Global Step: 165300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:02,909-Speed 10825.21 samples/sec Loss 4.5880 LearningRate 0.0075 Epoch: 29 Global Step: 165310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:03,870-Speed 10674.16 samples/sec Loss 4.6739 LearningRate 0.0075 Epoch: 29 Global Step: 165320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:04,829-Speed 10686.11 samples/sec Loss 4.6915 LearningRate 0.0075 Epoch: 29 Global Step: 165330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:06:05,769-Speed 10895.55 samples/sec Loss 4.4882 LearningRate 0.0075 Epoch: 29 Global Step: 165340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:06,684-Speed 11202.42 samples/sec Loss 4.7851 LearningRate 0.0075 Epoch: 29 Global Step: 165350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:07,615-Speed 11009.60 samples/sec Loss 4.6127 LearningRate 0.0075 Epoch: 29 Global Step: 165360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:08,558-Speed 10865.15 samples/sec Loss 4.5901 LearningRate 0.0074 Epoch: 29 Global Step: 165370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:09,470-Speed 11239.46 samples/sec Loss 4.8487 LearningRate 0.0074 Epoch: 29 Global Step: 165380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:10,355-Speed 11574.97 samples/sec Loss 4.7119 LearningRate 0.0074 Epoch: 29 Global Step: 165390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:11,341-Speed 10390.41 samples/sec Loss 4.7103 LearningRate 0.0074 Epoch: 29 Global Step: 165400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:12,388-Speed 9788.87 samples/sec Loss 4.6880 LearningRate 0.0074 Epoch: 29 Global Step: 165410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:13,419-Speed 9947.25 samples/sec Loss 4.5916 LearningRate 0.0074 Epoch: 29 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:14,332-Speed 11224.58 samples/sec Loss 4.6831 LearningRate 0.0074 Epoch: 29 Global Step: 165430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:15,248-Speed 11180.36 samples/sec Loss 4.6313 LearningRate 0.0074 Epoch: 29 Global Step: 165440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:16,175-Speed 11054.04 samples/sec Loss 4.6139 LearningRate 0.0074 Epoch: 29 Global Step: 165450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:17,171-Speed 10292.08 samples/sec Loss 4.8273 LearningRate 0.0074 Epoch: 29 Global Step: 165460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:18,149-Speed 10477.21 samples/sec Loss 4.7698 LearningRate 0.0074 Epoch: 29 Global Step: 165470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:19,048-Speed 11392.18 samples/sec Loss 4.6813 LearningRate 0.0074 Epoch: 29 Global Step: 165480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:20,008-Speed 10672.60 samples/sec Loss 4.6667 LearningRate 0.0074 Epoch: 29 Global Step: 165490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:20,903-Speed 11454.36 samples/sec Loss 4.7687 LearningRate 0.0074 Epoch: 29 Global Step: 165500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:21,790-Speed 11555.65 samples/sec Loss 4.7113 LearningRate 0.0074 Epoch: 29 Global Step: 165510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:22,685-Speed 11448.21 samples/sec Loss 4.7381 LearningRate 0.0074 Epoch: 29 Global Step: 165520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:23,605-Speed 11135.02 samples/sec Loss 4.7866 LearningRate 0.0074 Epoch: 29 Global Step: 165530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:24,520-Speed 11195.45 samples/sec Loss 4.7350 LearningRate 0.0074 Epoch: 29 Global Step: 165540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:06:25,431-Speed 11248.89 samples/sec Loss 4.7845 LearningRate 0.0074 Epoch: 29 Global Step: 165550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:26,343-Speed 11242.53 samples/sec Loss 4.7586 LearningRate 0.0074 Epoch: 29 Global Step: 165560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:27,277-Speed 10964.82 samples/sec Loss 4.7041 LearningRate 0.0074 Epoch: 29 Global Step: 165570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:28,244-Speed 10601.97 samples/sec Loss 4.5546 LearningRate 0.0074 Epoch: 29 Global Step: 165580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:29,180-Speed 10955.89 samples/sec Loss 4.6538 LearningRate 0.0074 Epoch: 29 Global Step: 165590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:30,081-Speed 11369.04 samples/sec Loss 4.6470 LearningRate 0.0074 Epoch: 29 Global Step: 165600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:30,975-Speed 11462.75 samples/sec Loss 4.7158 LearningRate 0.0074 Epoch: 29 Global Step: 165610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:31,904-Speed 11028.32 samples/sec Loss 4.6810 LearningRate 0.0074 Epoch: 29 Global Step: 165620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:32,797-Speed 11475.15 samples/sec Loss 4.6953 LearningRate 0.0074 Epoch: 29 Global Step: 165630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:33,690-Speed 11479.56 samples/sec Loss 4.7361 LearningRate 0.0074 Epoch: 29 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:34,588-Speed 11414.78 samples/sec Loss 4.7293 LearningRate 0.0074 Epoch: 29 Global Step: 165650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:06:35,473-Speed 11572.04 samples/sec Loss 4.6577 LearningRate 0.0074 Epoch: 29 Global Step: 165660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:36,403-Speed 11016.29 samples/sec Loss 4.6438 LearningRate 0.0074 Epoch: 29 Global Step: 165670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:37,324-Speed 11125.35 samples/sec Loss 4.7677 LearningRate 0.0074 Epoch: 29 Global Step: 165680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:38,287-Speed 10649.14 samples/sec Loss 4.7273 LearningRate 0.0074 Epoch: 29 Global Step: 165690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:39,246-Speed 10681.78 samples/sec Loss 4.6940 LearningRate 0.0074 Epoch: 29 Global Step: 165700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:40,145-Speed 11403.41 samples/sec Loss 4.7127 LearningRate 0.0074 Epoch: 29 Global Step: 165710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:41,076-Speed 10995.02 samples/sec Loss 4.7224 LearningRate 0.0074 Epoch: 29 Global Step: 165720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:41,984-Speed 11288.00 samples/sec Loss 4.7714 LearningRate 0.0074 Epoch: 29 Global Step: 165730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:42,907-Speed 11104.37 samples/sec Loss 4.6991 LearningRate 0.0074 Epoch: 29 Global Step: 165740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:43,845-Speed 10933.31 samples/sec Loss 4.5976 LearningRate 0.0074 Epoch: 29 Global Step: 165750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:44,750-Speed 11321.54 samples/sec Loss 4.7840 LearningRate 0.0074 Epoch: 29 Global Step: 165760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:06:45,636-Speed 11568.84 samples/sec Loss 4.8094 LearningRate 0.0074 Epoch: 29 Global Step: 165770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:46,549-Speed 11215.24 samples/sec Loss 4.7787 LearningRate 0.0073 Epoch: 29 Global Step: 165780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:47,462-Speed 11227.97 samples/sec Loss 4.7442 LearningRate 0.0073 Epoch: 29 Global Step: 165790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:48,396-Speed 10970.57 samples/sec Loss 4.7621 LearningRate 0.0073 Epoch: 29 Global Step: 165800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:49,350-Speed 10738.61 samples/sec Loss 4.7626 LearningRate 0.0073 Epoch: 29 Global Step: 165810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:50,295-Speed 10850.47 samples/sec Loss 4.6939 LearningRate 0.0073 Epoch: 29 Global Step: 165820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:51,216-Speed 11127.63 samples/sec Loss 4.6092 LearningRate 0.0073 Epoch: 29 Global Step: 165830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:52,144-Speed 11042.54 samples/sec Loss 4.7474 LearningRate 0.0073 Epoch: 29 Global Step: 165840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:53,093-Speed 10789.29 samples/sec Loss 4.7426 LearningRate 0.0073 Epoch: 29 Global Step: 165850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:53,982-Speed 11529.20 samples/sec Loss 4.7655 LearningRate 0.0073 Epoch: 29 Global Step: 165860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:54,889-Speed 11299.49 samples/sec Loss 4.6938 LearningRate 0.0073 Epoch: 29 Global Step: 165870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:55,808-Speed 11149.41 samples/sec Loss 4.7580 LearningRate 0.0073 Epoch: 29 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:56,734-Speed 11069.01 samples/sec Loss 4.8263 LearningRate 0.0073 Epoch: 29 Global Step: 165890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:06:57,645-Speed 11249.86 samples/sec Loss 4.6217 LearningRate 0.0073 Epoch: 29 Global Step: 165900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:06:58,569-Speed 11093.52 samples/sec Loss 4.7324 LearningRate 0.0073 Epoch: 29 Global Step: 165910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:06:59,496-Speed 11055.20 samples/sec Loss 4.8293 LearningRate 0.0073 Epoch: 29 Global Step: 165920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:00,443-Speed 10811.15 samples/sec Loss 4.8334 LearningRate 0.0073 Epoch: 29 Global Step: 165930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:01,361-Speed 11163.82 samples/sec Loss 4.8305 LearningRate 0.0073 Epoch: 29 Global Step: 165940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:02,296-Speed 10960.63 samples/sec Loss 4.6659 LearningRate 0.0073 Epoch: 29 Global Step: 165950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:03,243-Speed 10820.99 samples/sec Loss 4.8181 LearningRate 0.0073 Epoch: 29 Global Step: 165960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:04,191-Speed 10813.47 samples/sec Loss 4.7444 LearningRate 0.0073 Epoch: 29 Global Step: 165970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:05,121-Speed 11009.62 samples/sec Loss 4.8541 LearningRate 0.0073 Epoch: 29 Global Step: 165980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:06,020-Speed 11399.17 samples/sec Loss 4.7400 LearningRate 0.0073 Epoch: 29 Global Step: 165990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:06,935-Speed 11200.79 samples/sec Loss 4.7890 LearningRate 0.0073 Epoch: 29 Global Step: 166000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:07:28,238-[lfw][166000]XNorm: 8.433527 Training: 2022-04-11 05:07:28,239-[lfw][166000]Accuracy-Flip: 0.99650+-0.00283 Training: 2022-04-11 05:07:28,239-[lfw][166000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:07:52,790-[cfp_fp][166000]XNorm: 7.181227 Training: 2022-04-11 05:07:52,791-[cfp_fp][166000]Accuracy-Flip: 0.95171+-0.01174 Training: 2022-04-11 05:07:52,791-[cfp_fp][166000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:08:14,049-[agedb_30][166000]XNorm: 8.241300 Training: 2022-04-11 05:08:14,050-[agedb_30][166000]Accuracy-Flip: 0.96833+-0.00687 Training: 2022-04-11 05:08:14,050-[agedb_30][166000]Accuracy-Highest: 0.96900 Training: 2022-04-11 05:08:14,966-Speed 150.52 samples/sec Loss 4.7427 LearningRate 0.0073 Epoch: 29 Global Step: 166010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:15,891-Speed 11085.49 samples/sec Loss 4.8412 LearningRate 0.0073 Epoch: 29 Global Step: 166020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:16,828-Speed 10937.77 samples/sec Loss 4.6927 LearningRate 0.0073 Epoch: 29 Global Step: 166030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:17,766-Speed 10918.80 samples/sec Loss 4.9601 LearningRate 0.0073 Epoch: 29 Global Step: 166040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:18,693-Speed 11055.60 samples/sec Loss 4.6978 LearningRate 0.0073 Epoch: 29 Global Step: 166050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:19,613-Speed 11138.55 samples/sec Loss 4.6639 LearningRate 0.0073 Epoch: 29 Global Step: 166060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:20,509-Speed 11427.27 samples/sec Loss 4.8152 LearningRate 0.0073 Epoch: 29 Global Step: 166070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:21,422-Speed 11232.85 samples/sec Loss 4.8284 LearningRate 0.0073 Epoch: 29 Global Step: 166080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:22,305-Speed 11609.80 samples/sec Loss 4.8544 LearningRate 0.0073 Epoch: 29 Global Step: 166090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:23,190-Speed 11582.97 samples/sec Loss 4.7216 LearningRate 0.0073 Epoch: 29 Global Step: 166100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:24,116-Speed 11062.34 samples/sec Loss 4.8911 LearningRate 0.0073 Epoch: 29 Global Step: 166110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:25,020-Speed 11333.81 samples/sec Loss 4.8035 LearningRate 0.0073 Epoch: 29 Global Step: 166120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:25,921-Speed 11373.39 samples/sec Loss 4.8827 LearningRate 0.0073 Epoch: 29 Global Step: 166130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:26,849-Speed 11043.80 samples/sec Loss 4.7343 LearningRate 0.0073 Epoch: 29 Global Step: 166140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:27,774-Speed 11073.98 samples/sec Loss 4.7986 LearningRate 0.0073 Epoch: 29 Global Step: 166150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:28,698-Speed 11093.41 samples/sec Loss 4.7173 LearningRate 0.0073 Epoch: 29 Global Step: 166160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:29,663-Speed 10616.38 samples/sec Loss 4.7956 LearningRate 0.0073 Epoch: 29 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:30,611-Speed 10814.15 samples/sec Loss 4.8346 LearningRate 0.0073 Epoch: 29 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:31,561-Speed 10778.05 samples/sec Loss 4.6810 LearningRate 0.0073 Epoch: 29 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:32,452-Speed 11500.25 samples/sec Loss 4.8175 LearningRate 0.0072 Epoch: 29 Global Step: 166200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:08:33,344-Speed 11491.42 samples/sec Loss 4.9077 LearningRate 0.0072 Epoch: 29 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:34,259-Speed 11198.32 samples/sec Loss 4.8072 LearningRate 0.0072 Epoch: 29 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:35,180-Speed 11123.92 samples/sec Loss 4.7153 LearningRate 0.0072 Epoch: 29 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:36,069-Speed 11526.96 samples/sec Loss 4.8507 LearningRate 0.0072 Epoch: 29 Global Step: 166240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:36,986-Speed 11177.14 samples/sec Loss 4.7073 LearningRate 0.0072 Epoch: 29 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:37,929-Speed 10860.84 samples/sec Loss 4.8178 LearningRate 0.0072 Epoch: 29 Global Step: 166260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:38,872-Speed 10874.34 samples/sec Loss 4.7868 LearningRate 0.0072 Epoch: 29 Global Step: 166270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:39,843-Speed 10553.48 samples/sec Loss 4.7881 LearningRate 0.0072 Epoch: 29 Global Step: 166280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:40,777-Speed 10976.01 samples/sec Loss 4.7267 LearningRate 0.0072 Epoch: 29 Global Step: 166290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:41,707-Speed 11015.51 samples/sec Loss 4.8733 LearningRate 0.0072 Epoch: 29 Global Step: 166300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:42,635-Speed 11039.65 samples/sec Loss 4.7698 LearningRate 0.0072 Epoch: 29 Global Step: 166310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:43,581-Speed 10835.49 samples/sec Loss 4.7293 LearningRate 0.0072 Epoch: 29 Global Step: 166320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:44,500-Speed 11155.04 samples/sec Loss 4.8422 LearningRate 0.0072 Epoch: 29 Global Step: 166330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:45,399-Speed 11386.21 samples/sec Loss 4.7823 LearningRate 0.0072 Epoch: 29 Global Step: 166340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:46,294-Speed 11452.90 samples/sec Loss 4.8457 LearningRate 0.0072 Epoch: 29 Global Step: 166350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:08:47,215-Speed 11123.18 samples/sec Loss 4.6263 LearningRate 0.0072 Epoch: 29 Global Step: 166360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:48,141-Speed 11070.05 samples/sec Loss 4.7361 LearningRate 0.0072 Epoch: 29 Global Step: 166370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:49,111-Speed 10565.22 samples/sec Loss 4.6573 LearningRate 0.0072 Epoch: 29 Global Step: 166380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:50,071-Speed 10672.07 samples/sec Loss 4.8292 LearningRate 0.0072 Epoch: 29 Global Step: 166390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:51,014-Speed 10873.76 samples/sec Loss 4.8689 LearningRate 0.0072 Epoch: 29 Global Step: 166400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:51,970-Speed 10717.51 samples/sec Loss 4.7524 LearningRate 0.0072 Epoch: 29 Global Step: 166410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:52,882-Speed 11232.26 samples/sec Loss 4.7852 LearningRate 0.0072 Epoch: 29 Global Step: 166420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:53,814-Speed 10993.50 samples/sec Loss 4.7917 LearningRate 0.0072 Epoch: 29 Global Step: 166430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:54,767-Speed 10760.74 samples/sec Loss 4.8500 LearningRate 0.0072 Epoch: 29 Global Step: 166440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:55,682-Speed 11198.76 samples/sec Loss 4.6405 LearningRate 0.0072 Epoch: 29 Global Step: 166450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:56,572-Speed 11509.12 samples/sec Loss 4.8025 LearningRate 0.0072 Epoch: 29 Global Step: 166460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:57,517-Speed 10849.52 samples/sec Loss 4.8321 LearningRate 0.0072 Epoch: 29 Global Step: 166470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:58,447-Speed 11022.19 samples/sec Loss 4.8927 LearningRate 0.0072 Epoch: 29 Global Step: 166480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:08:59,401-Speed 10734.67 samples/sec Loss 4.8129 LearningRate 0.0072 Epoch: 29 Global Step: 166490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:00,292-Speed 11501.45 samples/sec Loss 4.8405 LearningRate 0.0072 Epoch: 29 Global Step: 166500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:01,241-Speed 10804.40 samples/sec Loss 4.8625 LearningRate 0.0072 Epoch: 29 Global Step: 166510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:02,189-Speed 10819.79 samples/sec Loss 4.9047 LearningRate 0.0072 Epoch: 29 Global Step: 166520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:03,107-Speed 11161.56 samples/sec Loss 4.7254 LearningRate 0.0072 Epoch: 29 Global Step: 166530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:04,030-Speed 11100.68 samples/sec Loss 4.7348 LearningRate 0.0072 Epoch: 29 Global Step: 166540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:04,985-Speed 10734.42 samples/sec Loss 4.8054 LearningRate 0.0072 Epoch: 29 Global Step: 166550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:05,935-Speed 10784.93 samples/sec Loss 4.8248 LearningRate 0.0072 Epoch: 29 Global Step: 166560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:09:06,851-Speed 11186.31 samples/sec Loss 4.8973 LearningRate 0.0072 Epoch: 29 Global Step: 166570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:09:07,777-Speed 11068.59 samples/sec Loss 4.8245 LearningRate 0.0072 Epoch: 29 Global Step: 166580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:08,715-Speed 10918.49 samples/sec Loss 4.8866 LearningRate 0.0072 Epoch: 29 Global Step: 166590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:09,619-Speed 11340.26 samples/sec Loss 4.8029 LearningRate 0.0072 Epoch: 29 Global Step: 166600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:10,504-Speed 11579.62 samples/sec Loss 4.9488 LearningRate 0.0072 Epoch: 29 Global Step: 166610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:11,388-Speed 11589.53 samples/sec Loss 4.6758 LearningRate 0.0072 Epoch: 29 Global Step: 166620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:12,361-Speed 10534.39 samples/sec Loss 4.6507 LearningRate 0.0071 Epoch: 29 Global Step: 166630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:13,250-Speed 11521.42 samples/sec Loss 4.8352 LearningRate 0.0071 Epoch: 29 Global Step: 166640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:14,166-Speed 11188.48 samples/sec Loss 4.8761 LearningRate 0.0071 Epoch: 29 Global Step: 166650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:15,088-Speed 11113.23 samples/sec Loss 4.6933 LearningRate 0.0071 Epoch: 29 Global Step: 166660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:16,050-Speed 10658.19 samples/sec Loss 4.9045 LearningRate 0.0071 Epoch: 29 Global Step: 166670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:17,049-Speed 10259.72 samples/sec Loss 4.8567 LearningRate 0.0071 Epoch: 29 Global Step: 166680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:18,013-Speed 10621.03 samples/sec Loss 4.8740 LearningRate 0.0071 Epoch: 29 Global Step: 166690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:18,998-Speed 10408.86 samples/sec Loss 4.8818 LearningRate 0.0071 Epoch: 29 Global Step: 166700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:19,933-Speed 10963.83 samples/sec Loss 4.7790 LearningRate 0.0071 Epoch: 29 Global Step: 166710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:20,836-Speed 11339.10 samples/sec Loss 4.8412 LearningRate 0.0071 Epoch: 29 Global Step: 166720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:21,754-Speed 11166.14 samples/sec Loss 4.8125 LearningRate 0.0071 Epoch: 29 Global Step: 166730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:22,677-Speed 11104.25 samples/sec Loss 4.8593 LearningRate 0.0071 Epoch: 29 Global Step: 166740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:23,599-Speed 11110.00 samples/sec Loss 4.7215 LearningRate 0.0071 Epoch: 29 Global Step: 166750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:24,549-Speed 10791.38 samples/sec Loss 4.7631 LearningRate 0.0071 Epoch: 29 Global Step: 166760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:25,465-Speed 11182.83 samples/sec Loss 4.8853 LearningRate 0.0071 Epoch: 29 Global Step: 166770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:26,394-Speed 11037.17 samples/sec Loss 4.8187 LearningRate 0.0071 Epoch: 29 Global Step: 166780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:27,295-Speed 11374.12 samples/sec Loss 4.8859 LearningRate 0.0071 Epoch: 29 Global Step: 166790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:28,232-Speed 10929.10 samples/sec Loss 4.8716 LearningRate 0.0071 Epoch: 29 Global Step: 166800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:09:29,188-Speed 10718.52 samples/sec Loss 4.8194 LearningRate 0.0071 Epoch: 29 Global Step: 166810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:30,115-Speed 11060.50 samples/sec Loss 5.0244 LearningRate 0.0071 Epoch: 29 Global Step: 166820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:31,058-Speed 10862.80 samples/sec Loss 4.9069 LearningRate 0.0071 Epoch: 29 Global Step: 166830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:31,967-Speed 11276.81 samples/sec Loss 4.7474 LearningRate 0.0071 Epoch: 29 Global Step: 166840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:32,888-Speed 11122.60 samples/sec Loss 4.9929 LearningRate 0.0071 Epoch: 29 Global Step: 166850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:33,822-Speed 10968.66 samples/sec Loss 4.6819 LearningRate 0.0071 Epoch: 29 Global Step: 166860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:34,724-Speed 11364.85 samples/sec Loss 4.8282 LearningRate 0.0071 Epoch: 29 Global Step: 166870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:35,632-Speed 11287.67 samples/sec Loss 4.8812 LearningRate 0.0071 Epoch: 29 Global Step: 166880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:36,613-Speed 10448.34 samples/sec Loss 4.8384 LearningRate 0.0071 Epoch: 29 Global Step: 166890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:37,565-Speed 10764.12 samples/sec Loss 4.8258 LearningRate 0.0071 Epoch: 29 Global Step: 166900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:38,525-Speed 10670.77 samples/sec Loss 4.7824 LearningRate 0.0071 Epoch: 29 Global Step: 166910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:39,497-Speed 10548.66 samples/sec Loss 4.8559 LearningRate 0.0071 Epoch: 29 Global Step: 166920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:40,414-Speed 11187.95 samples/sec Loss 4.8327 LearningRate 0.0071 Epoch: 29 Global Step: 166930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:41,391-Speed 10483.17 samples/sec Loss 4.9662 LearningRate 0.0071 Epoch: 29 Global Step: 166940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:42,311-Speed 11147.19 samples/sec Loss 4.8232 LearningRate 0.0071 Epoch: 29 Global Step: 166950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:43,211-Speed 11387.67 samples/sec Loss 4.8643 LearningRate 0.0071 Epoch: 29 Global Step: 166960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:44,177-Speed 10604.21 samples/sec Loss 4.8167 LearningRate 0.0071 Epoch: 29 Global Step: 166970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:45,085-Speed 11288.60 samples/sec Loss 4.7969 LearningRate 0.0071 Epoch: 29 Global Step: 166980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:45,990-Speed 11320.20 samples/sec Loss 5.0665 LearningRate 0.0071 Epoch: 29 Global Step: 166990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:46,909-Speed 11145.80 samples/sec Loss 5.0047 LearningRate 0.0071 Epoch: 29 Global Step: 167000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:47,838-Speed 11040.91 samples/sec Loss 4.8282 LearningRate 0.0071 Epoch: 29 Global Step: 167010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:48,766-Speed 11041.00 samples/sec Loss 4.7901 LearningRate 0.0071 Epoch: 29 Global Step: 167020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:49,720-Speed 10737.28 samples/sec Loss 4.9645 LearningRate 0.0071 Epoch: 29 Global Step: 167030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:09:50,626-Speed 11311.68 samples/sec Loss 4.8977 LearningRate 0.0071 Epoch: 29 Global Step: 167040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:51,563-Speed 10930.73 samples/sec Loss 4.8772 LearningRate 0.0071 Epoch: 29 Global Step: 167050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:52,496-Speed 10988.91 samples/sec Loss 4.9012 LearningRate 0.0070 Epoch: 29 Global Step: 167060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:53,478-Speed 10434.41 samples/sec Loss 4.8692 LearningRate 0.0070 Epoch: 29 Global Step: 167070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:54,442-Speed 10625.10 samples/sec Loss 4.9872 LearningRate 0.0070 Epoch: 29 Global Step: 167080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:55,401-Speed 10688.75 samples/sec Loss 4.8738 LearningRate 0.0070 Epoch: 29 Global Step: 167090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:56,361-Speed 10671.21 samples/sec Loss 4.9640 LearningRate 0.0070 Epoch: 29 Global Step: 167100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:57,256-Speed 11446.91 samples/sec Loss 4.9840 LearningRate 0.0070 Epoch: 29 Global Step: 167110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:58,188-Speed 10994.20 samples/sec Loss 4.8491 LearningRate 0.0070 Epoch: 29 Global Step: 167120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:09:59,068-Speed 11648.89 samples/sec Loss 4.7991 LearningRate 0.0070 Epoch: 29 Global Step: 167130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:00,015-Speed 10824.50 samples/sec Loss 4.9482 LearningRate 0.0070 Epoch: 29 Global Step: 167140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:00,959-Speed 10850.57 samples/sec Loss 4.8782 LearningRate 0.0070 Epoch: 29 Global Step: 167150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:01,945-Speed 10399.09 samples/sec Loss 4.9332 LearningRate 0.0070 Epoch: 29 Global Step: 167160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:02,884-Speed 10906.62 samples/sec Loss 4.9529 LearningRate 0.0070 Epoch: 29 Global Step: 167170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:03,833-Speed 10796.25 samples/sec Loss 4.8855 LearningRate 0.0070 Epoch: 29 Global Step: 167180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:04,736-Speed 11351.25 samples/sec Loss 5.0026 LearningRate 0.0070 Epoch: 29 Global Step: 167190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:05,668-Speed 10999.91 samples/sec Loss 4.8554 LearningRate 0.0070 Epoch: 29 Global Step: 167200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:06,586-Speed 11155.51 samples/sec Loss 4.8085 LearningRate 0.0070 Epoch: 29 Global Step: 167210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:07,535-Speed 10795.70 samples/sec Loss 4.9736 LearningRate 0.0070 Epoch: 29 Global Step: 167220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:08,505-Speed 10564.90 samples/sec Loss 5.0936 LearningRate 0.0070 Epoch: 29 Global Step: 167230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:09,403-Speed 11419.18 samples/sec Loss 4.8959 LearningRate 0.0070 Epoch: 29 Global Step: 167240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:10,293-Speed 11515.00 samples/sec Loss 4.7110 LearningRate 0.0070 Epoch: 29 Global Step: 167250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:11,205-Speed 11230.88 samples/sec Loss 4.9076 LearningRate 0.0070 Epoch: 29 Global Step: 167260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:12,108-Speed 11345.48 samples/sec Loss 4.9893 LearningRate 0.0070 Epoch: 29 Global Step: 167270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:13,030-Speed 11114.74 samples/sec Loss 4.9498 LearningRate 0.0070 Epoch: 29 Global Step: 167280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:13,972-Speed 10874.01 samples/sec Loss 4.9902 LearningRate 0.0070 Epoch: 29 Global Step: 167290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:14,926-Speed 10743.22 samples/sec Loss 4.9419 LearningRate 0.0070 Epoch: 29 Global Step: 167300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:15,837-Speed 11246.66 samples/sec Loss 4.8577 LearningRate 0.0070 Epoch: 29 Global Step: 167310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:16,762-Speed 11082.56 samples/sec Loss 4.8528 LearningRate 0.0070 Epoch: 29 Global Step: 167320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:17,729-Speed 10595.73 samples/sec Loss 4.9223 LearningRate 0.0070 Epoch: 29 Global Step: 167330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:18,660-Speed 11004.38 samples/sec Loss 4.9562 LearningRate 0.0070 Epoch: 29 Global Step: 167340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:19,618-Speed 10696.68 samples/sec Loss 5.0258 LearningRate 0.0070 Epoch: 29 Global Step: 167350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:20,555-Speed 10938.79 samples/sec Loss 5.0250 LearningRate 0.0070 Epoch: 29 Global Step: 167360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:21,452-Speed 11418.17 samples/sec Loss 4.9096 LearningRate 0.0070 Epoch: 29 Global Step: 167370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:22,358-Speed 11314.93 samples/sec Loss 4.7993 LearningRate 0.0070 Epoch: 29 Global Step: 167380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:23,280-Speed 11123.39 samples/sec Loss 5.0770 LearningRate 0.0070 Epoch: 29 Global Step: 167390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:24,203-Speed 11100.12 samples/sec Loss 4.8839 LearningRate 0.0070 Epoch: 29 Global Step: 167400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:25,102-Speed 11397.19 samples/sec Loss 4.8959 LearningRate 0.0070 Epoch: 29 Global Step: 167410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:26,029-Speed 11051.03 samples/sec Loss 4.8775 LearningRate 0.0070 Epoch: 29 Global Step: 167420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:26,973-Speed 10858.73 samples/sec Loss 4.8254 LearningRate 0.0070 Epoch: 29 Global Step: 167430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:27,900-Speed 11050.16 samples/sec Loss 4.8440 LearningRate 0.0070 Epoch: 29 Global Step: 167440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:28,831-Speed 11014.30 samples/sec Loss 4.9713 LearningRate 0.0070 Epoch: 29 Global Step: 167450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:29,752-Speed 11121.57 samples/sec Loss 4.9313 LearningRate 0.0070 Epoch: 29 Global Step: 167460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:30,722-Speed 10563.92 samples/sec Loss 4.9393 LearningRate 0.0070 Epoch: 29 Global Step: 167470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:31,705-Speed 10423.45 samples/sec Loss 4.9948 LearningRate 0.0070 Epoch: 29 Global Step: 167480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:32,641-Speed 10950.19 samples/sec Loss 4.9070 LearningRate 0.0069 Epoch: 29 Global Step: 167490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:33,536-Speed 11461.13 samples/sec Loss 5.0151 LearningRate 0.0069 Epoch: 29 Global Step: 167500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:34,441-Speed 11319.53 samples/sec Loss 4.9601 LearningRate 0.0069 Epoch: 29 Global Step: 167510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:10:35,341-Speed 11384.82 samples/sec Loss 4.9144 LearningRate 0.0069 Epoch: 29 Global Step: 167520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:36,254-Speed 11225.45 samples/sec Loss 4.9946 LearningRate 0.0069 Epoch: 29 Global Step: 167530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:37,240-Speed 10395.30 samples/sec Loss 5.0259 LearningRate 0.0069 Epoch: 29 Global Step: 167540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:38,220-Speed 10454.02 samples/sec Loss 4.9840 LearningRate 0.0069 Epoch: 29 Global Step: 167550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:39,178-Speed 10691.96 samples/sec Loss 4.9836 LearningRate 0.0069 Epoch: 29 Global Step: 167560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:40,114-Speed 10954.69 samples/sec Loss 4.9045 LearningRate 0.0069 Epoch: 29 Global Step: 167570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:41,036-Speed 11113.46 samples/sec Loss 4.8283 LearningRate 0.0069 Epoch: 29 Global Step: 167580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:41,999-Speed 10641.56 samples/sec Loss 4.9705 LearningRate 0.0069 Epoch: 29 Global Step: 167590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:42,914-Speed 11197.36 samples/sec Loss 4.9704 LearningRate 0.0069 Epoch: 29 Global Step: 167600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:43,874-Speed 10683.55 samples/sec Loss 5.1255 LearningRate 0.0069 Epoch: 29 Global Step: 167610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:44,786-Speed 11232.02 samples/sec Loss 5.0092 LearningRate 0.0069 Epoch: 29 Global Step: 167620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:45,688-Speed 11359.39 samples/sec Loss 4.9100 LearningRate 0.0069 Epoch: 29 Global Step: 167630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:46,584-Speed 11438.53 samples/sec Loss 4.9134 LearningRate 0.0069 Epoch: 29 Global Step: 167640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:47,488-Speed 11333.58 samples/sec Loss 4.9167 LearningRate 0.0069 Epoch: 29 Global Step: 167650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:48,422-Speed 10968.18 samples/sec Loss 4.9216 LearningRate 0.0069 Epoch: 29 Global Step: 167660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:49,355-Speed 10992.29 samples/sec Loss 4.9499 LearningRate 0.0069 Epoch: 29 Global Step: 167670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:50,288-Speed 10984.69 samples/sec Loss 4.8931 LearningRate 0.0069 Epoch: 29 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:51,226-Speed 10915.07 samples/sec Loss 4.9136 LearningRate 0.0069 Epoch: 29 Global Step: 167690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:52,162-Speed 10955.27 samples/sec Loss 4.8383 LearningRate 0.0069 Epoch: 29 Global Step: 167700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:53,116-Speed 10740.99 samples/sec Loss 4.9445 LearningRate 0.0069 Epoch: 29 Global Step: 167710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:54,092-Speed 10502.34 samples/sec Loss 5.0152 LearningRate 0.0069 Epoch: 29 Global Step: 167720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:54,991-Speed 11390.86 samples/sec Loss 4.9200 LearningRate 0.0069 Epoch: 29 Global Step: 167730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:10:55,949-Speed 10694.59 samples/sec Loss 5.0016 LearningRate 0.0069 Epoch: 29 Global Step: 167740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:56,879-Speed 11024.52 samples/sec Loss 4.9570 LearningRate 0.0069 Epoch: 29 Global Step: 167750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:57,833-Speed 10737.18 samples/sec Loss 4.9153 LearningRate 0.0069 Epoch: 29 Global Step: 167760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:58,779-Speed 10833.14 samples/sec Loss 4.9699 LearningRate 0.0069 Epoch: 29 Global Step: 167770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:10:59,715-Speed 10946.62 samples/sec Loss 4.8804 LearningRate 0.0069 Epoch: 29 Global Step: 167780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:00,689-Speed 10515.95 samples/sec Loss 4.9666 LearningRate 0.0069 Epoch: 29 Global Step: 167790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:01,631-Speed 10885.68 samples/sec Loss 4.9186 LearningRate 0.0069 Epoch: 29 Global Step: 167800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:02,573-Speed 10872.28 samples/sec Loss 4.8893 LearningRate 0.0069 Epoch: 29 Global Step: 167810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:03,537-Speed 10629.91 samples/sec Loss 5.0701 LearningRate 0.0069 Epoch: 29 Global Step: 167820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:04,515-Speed 10478.51 samples/sec Loss 5.0138 LearningRate 0.0069 Epoch: 29 Global Step: 167830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:05,448-Speed 10979.74 samples/sec Loss 4.9911 LearningRate 0.0069 Epoch: 29 Global Step: 167840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:06,381-Speed 10989.42 samples/sec Loss 4.9089 LearningRate 0.0069 Epoch: 29 Global Step: 167850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:07,331-Speed 10786.05 samples/sec Loss 5.0248 LearningRate 0.0069 Epoch: 29 Global Step: 167860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:08,254-Speed 11105.54 samples/sec Loss 4.9772 LearningRate 0.0069 Epoch: 29 Global Step: 167870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:09,182-Speed 11045.12 samples/sec Loss 4.9600 LearningRate 0.0069 Epoch: 29 Global Step: 167880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:11:10,093-Speed 11244.62 samples/sec Loss 5.0587 LearningRate 0.0069 Epoch: 29 Global Step: 167890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:11,001-Speed 11276.07 samples/sec Loss 4.8215 LearningRate 0.0069 Epoch: 29 Global Step: 167900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:11,907-Speed 11315.47 samples/sec Loss 4.8589 LearningRate 0.0069 Epoch: 29 Global Step: 167910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:12,876-Speed 10573.70 samples/sec Loss 4.8841 LearningRate 0.0068 Epoch: 29 Global Step: 167920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:13,803-Speed 11058.58 samples/sec Loss 4.9093 LearningRate 0.0068 Epoch: 29 Global Step: 167930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:14,738-Speed 10956.99 samples/sec Loss 5.0258 LearningRate 0.0068 Epoch: 29 Global Step: 167940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:15,654-Speed 11181.69 samples/sec Loss 4.8879 LearningRate 0.0068 Epoch: 29 Global Step: 167950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:16,585-Speed 11003.05 samples/sec Loss 4.9743 LearningRate 0.0068 Epoch: 29 Global Step: 167960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:17,528-Speed 10867.69 samples/sec Loss 4.8386 LearningRate 0.0068 Epoch: 29 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:18,474-Speed 10837.16 samples/sec Loss 4.7568 LearningRate 0.0068 Epoch: 29 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:11:19,445-Speed 10552.08 samples/sec Loss 4.9261 LearningRate 0.0068 Epoch: 29 Global Step: 167990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:11:20,431-Speed 10394.24 samples/sec Loss 4.8294 LearningRate 0.0068 Epoch: 29 Global Step: 168000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:11:41,752-[lfw][168000]XNorm: 8.391283 Training: 2022-04-11 05:11:41,753-[lfw][168000]Accuracy-Flip: 0.99717+-0.00269 Training: 2022-04-11 05:11:41,753-[lfw][168000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:12:06,377-[cfp_fp][168000]XNorm: 7.102422 Training: 2022-04-11 05:12:06,378-[cfp_fp][168000]Accuracy-Flip: 0.95214+-0.01154 Training: 2022-04-11 05:12:06,378-[cfp_fp][168000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:12:27,594-[agedb_30][168000]XNorm: 8.199998 Training: 2022-04-11 05:12:27,594-[agedb_30][168000]Accuracy-Flip: 0.96633+-0.00632 Training: 2022-04-11 05:12:27,595-[agedb_30][168000]Accuracy-Highest: 0.96900 Training: 2022-04-11 05:12:28,544-Speed 150.34 samples/sec Loss 4.9315 LearningRate 0.0068 Epoch: 29 Global Step: 168010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:29,486-Speed 10870.70 samples/sec Loss 4.9364 LearningRate 0.0068 Epoch: 29 Global Step: 168020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:30,428-Speed 10882.32 samples/sec Loss 4.9761 LearningRate 0.0068 Epoch: 29 Global Step: 168030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:31,325-Speed 11425.56 samples/sec Loss 4.8978 LearningRate 0.0068 Epoch: 29 Global Step: 168040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:32,292-Speed 10596.59 samples/sec Loss 4.9269 LearningRate 0.0068 Epoch: 29 Global Step: 168050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:33,202-Speed 11261.58 samples/sec Loss 4.9098 LearningRate 0.0068 Epoch: 29 Global Step: 168060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:34,099-Speed 11416.97 samples/sec Loss 4.9787 LearningRate 0.0068 Epoch: 29 Global Step: 168070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:34,996-Speed 11424.48 samples/sec Loss 4.9634 LearningRate 0.0068 Epoch: 29 Global Step: 168080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:35,923-Speed 11054.90 samples/sec Loss 4.9338 LearningRate 0.0068 Epoch: 29 Global Step: 168090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:36,892-Speed 10572.82 samples/sec Loss 4.8731 LearningRate 0.0068 Epoch: 29 Global Step: 168100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:37,817-Speed 11077.48 samples/sec Loss 4.8936 LearningRate 0.0068 Epoch: 29 Global Step: 168110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:38,772-Speed 10733.47 samples/sec Loss 5.0254 LearningRate 0.0068 Epoch: 29 Global Step: 168120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:39,709-Speed 10936.50 samples/sec Loss 4.9720 LearningRate 0.0068 Epoch: 29 Global Step: 168130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:40,649-Speed 10897.86 samples/sec Loss 4.9668 LearningRate 0.0068 Epoch: 29 Global Step: 168140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:41,558-Speed 11288.47 samples/sec Loss 4.8872 LearningRate 0.0068 Epoch: 29 Global Step: 168150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:42,517-Speed 10684.49 samples/sec Loss 4.9756 LearningRate 0.0068 Epoch: 29 Global Step: 168160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:43,459-Speed 10880.12 samples/sec Loss 5.0356 LearningRate 0.0068 Epoch: 29 Global Step: 168170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:44,387-Speed 11042.68 samples/sec Loss 4.8831 LearningRate 0.0068 Epoch: 29 Global Step: 168180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:45,304-Speed 11172.83 samples/sec Loss 4.9826 LearningRate 0.0068 Epoch: 29 Global Step: 168190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:46,198-Speed 11464.35 samples/sec Loss 4.8858 LearningRate 0.0068 Epoch: 29 Global Step: 168200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:47,115-Speed 11169.91 samples/sec Loss 4.8343 LearningRate 0.0068 Epoch: 29 Global Step: 168210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:48,073-Speed 10693.37 samples/sec Loss 4.8373 LearningRate 0.0068 Epoch: 29 Global Step: 168220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:49,005-Speed 11003.12 samples/sec Loss 4.9129 LearningRate 0.0068 Epoch: 29 Global Step: 168230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:49,936-Speed 11009.92 samples/sec Loss 4.8229 LearningRate 0.0068 Epoch: 29 Global Step: 168240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:50,883-Speed 10819.88 samples/sec Loss 4.9362 LearningRate 0.0068 Epoch: 29 Global Step: 168250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:51,810-Speed 11057.54 samples/sec Loss 4.8940 LearningRate 0.0068 Epoch: 29 Global Step: 168260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:12:52,743-Speed 10981.64 samples/sec Loss 4.9733 LearningRate 0.0068 Epoch: 29 Global Step: 168270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:53,704-Speed 10665.72 samples/sec Loss 4.9986 LearningRate 0.0068 Epoch: 29 Global Step: 168280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:54,634-Speed 11009.49 samples/sec Loss 4.9249 LearningRate 0.0068 Epoch: 29 Global Step: 168290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:55,571-Speed 10936.64 samples/sec Loss 4.8549 LearningRate 0.0068 Epoch: 29 Global Step: 168300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:56,533-Speed 10655.43 samples/sec Loss 4.9629 LearningRate 0.0068 Epoch: 29 Global Step: 168310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:57,456-Speed 11101.11 samples/sec Loss 4.9695 LearningRate 0.0068 Epoch: 29 Global Step: 168320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:58,449-Speed 10322.03 samples/sec Loss 5.0192 LearningRate 0.0068 Epoch: 29 Global Step: 168330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:12:59,428-Speed 10460.19 samples/sec Loss 4.8890 LearningRate 0.0068 Epoch: 29 Global Step: 168340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:00,361-Speed 10982.35 samples/sec Loss 4.9056 LearningRate 0.0067 Epoch: 29 Global Step: 168350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:01,295-Speed 10978.64 samples/sec Loss 5.0547 LearningRate 0.0067 Epoch: 29 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:02,206-Speed 11248.67 samples/sec Loss 4.8363 LearningRate 0.0067 Epoch: 29 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:03,120-Speed 11206.75 samples/sec Loss 5.0186 LearningRate 0.0067 Epoch: 29 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:04,068-Speed 10805.78 samples/sec Loss 4.8459 LearningRate 0.0067 Epoch: 29 Global Step: 168390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:05,008-Speed 10909.05 samples/sec Loss 4.9550 LearningRate 0.0067 Epoch: 29 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:05,920-Speed 11233.79 samples/sec Loss 4.9733 LearningRate 0.0067 Epoch: 29 Global Step: 168410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:06,837-Speed 11174.02 samples/sec Loss 5.0248 LearningRate 0.0067 Epoch: 29 Global Step: 168420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:07,766-Speed 11033.15 samples/sec Loss 5.0690 LearningRate 0.0067 Epoch: 29 Global Step: 168430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:08,671-Speed 11319.93 samples/sec Loss 4.9453 LearningRate 0.0067 Epoch: 29 Global Step: 168440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:09,607-Speed 10943.72 samples/sec Loss 5.1458 LearningRate 0.0067 Epoch: 29 Global Step: 168450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:10,520-Speed 11222.37 samples/sec Loss 4.9759 LearningRate 0.0067 Epoch: 29 Global Step: 168460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:11,445-Speed 11083.59 samples/sec Loss 4.9848 LearningRate 0.0067 Epoch: 29 Global Step: 168470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:12,389-Speed 10860.06 samples/sec Loss 4.9114 LearningRate 0.0067 Epoch: 29 Global Step: 168480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:13,329-Speed 10896.89 samples/sec Loss 4.8919 LearningRate 0.0067 Epoch: 29 Global Step: 168490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:14,280-Speed 10771.22 samples/sec Loss 4.9541 LearningRate 0.0067 Epoch: 29 Global Step: 168500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:15,274-Speed 10313.92 samples/sec Loss 5.1403 LearningRate 0.0067 Epoch: 29 Global Step: 168510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:16,226-Speed 10765.78 samples/sec Loss 4.9649 LearningRate 0.0067 Epoch: 29 Global Step: 168520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:17,211-Speed 10402.91 samples/sec Loss 5.0049 LearningRate 0.0067 Epoch: 29 Global Step: 168530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:18,168-Speed 10707.13 samples/sec Loss 5.0345 LearningRate 0.0067 Epoch: 29 Global Step: 168540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:19,119-Speed 10776.31 samples/sec Loss 5.0022 LearningRate 0.0067 Epoch: 29 Global Step: 168550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:20,083-Speed 10625.92 samples/sec Loss 4.9495 LearningRate 0.0067 Epoch: 29 Global Step: 168560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:20,994-Speed 11247.69 samples/sec Loss 5.0117 LearningRate 0.0067 Epoch: 29 Global Step: 168570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:21,917-Speed 11101.22 samples/sec Loss 5.0003 LearningRate 0.0067 Epoch: 29 Global Step: 168580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:22,832-Speed 11200.84 samples/sec Loss 5.1078 LearningRate 0.0067 Epoch: 29 Global Step: 168590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:23,804-Speed 10541.12 samples/sec Loss 4.9777 LearningRate 0.0067 Epoch: 29 Global Step: 168600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:24,770-Speed 10611.13 samples/sec Loss 4.9481 LearningRate 0.0067 Epoch: 29 Global Step: 168610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:25,701-Speed 10998.75 samples/sec Loss 5.0365 LearningRate 0.0067 Epoch: 29 Global Step: 168620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:26,619-Speed 11161.38 samples/sec Loss 4.9181 LearningRate 0.0067 Epoch: 29 Global Step: 168630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:27,563-Speed 10857.96 samples/sec Loss 5.0021 LearningRate 0.0067 Epoch: 29 Global Step: 168640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:28,518-Speed 10729.82 samples/sec Loss 4.9676 LearningRate 0.0067 Epoch: 29 Global Step: 168650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:29,504-Speed 10393.83 samples/sec Loss 4.8562 LearningRate 0.0067 Epoch: 29 Global Step: 168660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:30,467-Speed 10632.53 samples/sec Loss 4.9596 LearningRate 0.0067 Epoch: 29 Global Step: 168670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:31,454-Speed 10382.07 samples/sec Loss 5.0798 LearningRate 0.0067 Epoch: 29 Global Step: 168680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:32,449-Speed 10308.37 samples/sec Loss 4.9069 LearningRate 0.0067 Epoch: 29 Global Step: 168690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:33,354-Speed 11325.17 samples/sec Loss 4.9444 LearningRate 0.0067 Epoch: 29 Global Step: 168700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:34,239-Speed 11569.23 samples/sec Loss 4.9728 LearningRate 0.0067 Epoch: 29 Global Step: 168710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:35,135-Speed 11446.42 samples/sec Loss 4.9894 LearningRate 0.0067 Epoch: 29 Global Step: 168720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:36,035-Speed 11380.30 samples/sec Loss 4.9767 LearningRate 0.0067 Epoch: 29 Global Step: 168730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:36,963-Speed 11046.94 samples/sec Loss 4.9284 LearningRate 0.0067 Epoch: 29 Global Step: 168740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:37,888-Speed 11074.37 samples/sec Loss 5.0853 LearningRate 0.0067 Epoch: 29 Global Step: 168750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:38,796-Speed 11289.50 samples/sec Loss 5.0192 LearningRate 0.0067 Epoch: 29 Global Step: 168760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:39,747-Speed 10772.74 samples/sec Loss 4.9047 LearningRate 0.0067 Epoch: 29 Global Step: 168770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:40,676-Speed 11026.77 samples/sec Loss 4.8549 LearningRate 0.0067 Epoch: 29 Global Step: 168780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:41,633-Speed 10712.39 samples/sec Loss 5.0457 LearningRate 0.0066 Epoch: 29 Global Step: 168790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:42,600-Speed 10594.72 samples/sec Loss 4.8752 LearningRate 0.0066 Epoch: 29 Global Step: 168800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:43,550-Speed 10787.73 samples/sec Loss 4.9720 LearningRate 0.0066 Epoch: 29 Global Step: 168810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:44,492-Speed 10871.26 samples/sec Loss 4.8927 LearningRate 0.0066 Epoch: 29 Global Step: 168820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:45,388-Speed 11438.85 samples/sec Loss 4.9894 LearningRate 0.0066 Epoch: 29 Global Step: 168830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:46,287-Speed 11392.73 samples/sec Loss 4.9570 LearningRate 0.0066 Epoch: 29 Global Step: 168840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:47,224-Speed 10943.69 samples/sec Loss 4.9249 LearningRate 0.0066 Epoch: 29 Global Step: 168850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:48,160-Speed 10941.68 samples/sec Loss 4.9447 LearningRate 0.0066 Epoch: 29 Global Step: 168860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:49,107-Speed 10823.62 samples/sec Loss 4.9629 LearningRate 0.0066 Epoch: 29 Global Step: 168870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:50,038-Speed 11009.08 samples/sec Loss 4.9666 LearningRate 0.0066 Epoch: 29 Global Step: 168880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:50,970-Speed 10992.61 samples/sec Loss 4.9026 LearningRate 0.0066 Epoch: 29 Global Step: 168890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:51,921-Speed 10779.36 samples/sec Loss 4.8661 LearningRate 0.0066 Epoch: 29 Global Step: 168900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:52,857-Speed 10947.51 samples/sec Loss 4.8870 LearningRate 0.0066 Epoch: 29 Global Step: 168910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:53,774-Speed 11172.92 samples/sec Loss 4.8414 LearningRate 0.0066 Epoch: 29 Global Step: 168920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:54,722-Speed 10810.49 samples/sec Loss 4.8300 LearningRate 0.0066 Epoch: 29 Global Step: 168930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:55,606-Speed 11584.89 samples/sec Loss 4.9632 LearningRate 0.0066 Epoch: 29 Global Step: 168940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:13:56,487-Speed 11630.59 samples/sec Loss 5.0632 LearningRate 0.0066 Epoch: 29 Global Step: 168950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:57,392-Speed 11326.30 samples/sec Loss 5.0885 LearningRate 0.0066 Epoch: 29 Global Step: 168960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:58,319-Speed 11055.79 samples/sec Loss 4.9904 LearningRate 0.0066 Epoch: 29 Global Step: 168970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:13:59,261-Speed 10882.57 samples/sec Loss 4.9654 LearningRate 0.0066 Epoch: 29 Global Step: 168980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:00,198-Speed 10927.25 samples/sec Loss 4.8159 LearningRate 0.0066 Epoch: 29 Global Step: 168990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:01,131-Speed 10985.01 samples/sec Loss 4.9507 LearningRate 0.0066 Epoch: 29 Global Step: 169000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:02,076-Speed 10849.25 samples/sec Loss 4.9747 LearningRate 0.0066 Epoch: 29 Global Step: 169010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:03,009-Speed 10977.06 samples/sec Loss 5.1884 LearningRate 0.0066 Epoch: 29 Global Step: 169020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:03,954-Speed 10843.91 samples/sec Loss 5.0305 LearningRate 0.0066 Epoch: 29 Global Step: 169030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:04,905-Speed 10781.47 samples/sec Loss 4.9173 LearningRate 0.0066 Epoch: 29 Global Step: 169040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:05,798-Speed 11473.80 samples/sec Loss 4.9086 LearningRate 0.0066 Epoch: 29 Global Step: 169050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:06,717-Speed 11143.59 samples/sec Loss 4.9767 LearningRate 0.0066 Epoch: 29 Global Step: 169060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:07,659-Speed 10881.91 samples/sec Loss 4.9475 LearningRate 0.0066 Epoch: 29 Global Step: 169070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:08,611-Speed 10759.70 samples/sec Loss 5.0668 LearningRate 0.0066 Epoch: 29 Global Step: 169080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:09,523-Speed 11234.78 samples/sec Loss 4.8981 LearningRate 0.0066 Epoch: 29 Global Step: 169090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:10,425-Speed 11356.73 samples/sec Loss 4.8667 LearningRate 0.0066 Epoch: 29 Global Step: 169100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:11,342-Speed 11177.76 samples/sec Loss 4.9707 LearningRate 0.0066 Epoch: 29 Global Step: 169110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:12,292-Speed 10793.74 samples/sec Loss 4.9274 LearningRate 0.0066 Epoch: 29 Global Step: 169120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:13,225-Speed 10982.04 samples/sec Loss 4.8829 LearningRate 0.0066 Epoch: 29 Global Step: 169130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:14,241-Speed 10084.62 samples/sec Loss 5.0685 LearningRate 0.0066 Epoch: 29 Global Step: 169140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:15,209-Speed 10584.63 samples/sec Loss 4.9896 LearningRate 0.0066 Epoch: 29 Global Step: 169150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:14:16,168-Speed 10681.52 samples/sec Loss 4.8872 LearningRate 0.0066 Epoch: 29 Global Step: 169160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:17,110-Speed 10884.68 samples/sec Loss 4.9475 LearningRate 0.0066 Epoch: 29 Global Step: 169170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:18,081-Speed 10547.04 samples/sec Loss 4.8963 LearningRate 0.0066 Epoch: 29 Global Step: 169180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:19,036-Speed 10738.59 samples/sec Loss 5.0245 LearningRate 0.0066 Epoch: 29 Global Step: 169190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:19,990-Speed 10736.42 samples/sec Loss 4.9690 LearningRate 0.0066 Epoch: 29 Global Step: 169200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:20,916-Speed 11061.21 samples/sec Loss 4.9400 LearningRate 0.0066 Epoch: 29 Global Step: 169210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:21,826-Speed 11272.35 samples/sec Loss 4.9983 LearningRate 0.0066 Epoch: 29 Global Step: 169220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:22,718-Speed 11481.95 samples/sec Loss 4.8928 LearningRate 0.0066 Epoch: 29 Global Step: 169230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:23,678-Speed 10674.53 samples/sec Loss 5.0973 LearningRate 0.0065 Epoch: 29 Global Step: 169240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:24,611-Speed 10988.43 samples/sec Loss 5.0824 LearningRate 0.0065 Epoch: 29 Global Step: 169250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:25,544-Speed 10974.35 samples/sec Loss 4.9940 LearningRate 0.0065 Epoch: 29 Global Step: 169260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:26,508-Speed 10636.14 samples/sec Loss 5.1156 LearningRate 0.0065 Epoch: 29 Global Step: 169270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:27,462-Speed 10737.58 samples/sec Loss 5.1204 LearningRate 0.0065 Epoch: 29 Global Step: 169280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:28,396-Speed 10968.24 samples/sec Loss 4.9913 LearningRate 0.0065 Epoch: 29 Global Step: 169290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:29,354-Speed 10696.61 samples/sec Loss 4.9691 LearningRate 0.0065 Epoch: 29 Global Step: 169300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:30,336-Speed 10432.90 samples/sec Loss 4.9667 LearningRate 0.0065 Epoch: 29 Global Step: 169310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:31,298-Speed 10653.07 samples/sec Loss 4.8840 LearningRate 0.0065 Epoch: 29 Global Step: 169320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:32,243-Speed 10845.16 samples/sec Loss 4.9127 LearningRate 0.0065 Epoch: 29 Global Step: 169330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:33,157-Speed 11208.97 samples/sec Loss 4.9130 LearningRate 0.0065 Epoch: 29 Global Step: 169340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:34,055-Speed 11417.33 samples/sec Loss 4.9689 LearningRate 0.0065 Epoch: 29 Global Step: 169350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:34,984-Speed 11026.54 samples/sec Loss 4.9039 LearningRate 0.0065 Epoch: 29 Global Step: 169360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:35,879-Speed 11445.06 samples/sec Loss 4.9267 LearningRate 0.0065 Epoch: 29 Global Step: 169370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:36,832-Speed 10757.67 samples/sec Loss 4.9151 LearningRate 0.0065 Epoch: 29 Global Step: 169380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:37,784-Speed 10764.32 samples/sec Loss 5.0026 LearningRate 0.0065 Epoch: 29 Global Step: 169390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:38,693-Speed 11272.85 samples/sec Loss 4.9024 LearningRate 0.0065 Epoch: 29 Global Step: 169400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:39,642-Speed 10804.53 samples/sec Loss 4.9995 LearningRate 0.0065 Epoch: 29 Global Step: 169410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:40,618-Speed 10495.04 samples/sec Loss 4.8805 LearningRate 0.0065 Epoch: 29 Global Step: 169420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:41,570-Speed 10771.68 samples/sec Loss 5.0134 LearningRate 0.0065 Epoch: 29 Global Step: 169430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:42,507-Speed 10934.32 samples/sec Loss 4.9214 LearningRate 0.0065 Epoch: 29 Global Step: 169440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:43,492-Speed 10404.86 samples/sec Loss 4.9742 LearningRate 0.0065 Epoch: 29 Global Step: 169450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:44,428-Speed 10949.60 samples/sec Loss 4.8668 LearningRate 0.0065 Epoch: 29 Global Step: 169460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:45,311-Speed 11625.35 samples/sec Loss 4.9302 LearningRate 0.0065 Epoch: 29 Global Step: 169470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:46,224-Speed 11220.35 samples/sec Loss 4.9618 LearningRate 0.0065 Epoch: 29 Global Step: 169480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:47,123-Speed 11402.38 samples/sec Loss 4.9435 LearningRate 0.0065 Epoch: 29 Global Step: 169490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:48,064-Speed 10889.69 samples/sec Loss 5.0125 LearningRate 0.0065 Epoch: 29 Global Step: 169500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:48,990-Speed 11066.81 samples/sec Loss 5.0189 LearningRate 0.0065 Epoch: 29 Global Step: 169510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:49,918-Speed 11033.14 samples/sec Loss 4.8489 LearningRate 0.0065 Epoch: 29 Global Step: 169520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:50,845-Speed 11060.09 samples/sec Loss 4.9313 LearningRate 0.0065 Epoch: 29 Global Step: 169530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:51,775-Speed 11014.10 samples/sec Loss 5.0357 LearningRate 0.0065 Epoch: 29 Global Step: 169540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:52,740-Speed 10630.57 samples/sec Loss 4.9909 LearningRate 0.0065 Epoch: 29 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:14:53,688-Speed 10802.75 samples/sec Loss 5.0448 LearningRate 0.0065 Epoch: 29 Global Step: 169560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:54,642-Speed 10741.73 samples/sec Loss 5.1142 LearningRate 0.0065 Epoch: 29 Global Step: 169570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:55,595-Speed 10750.12 samples/sec Loss 4.9968 LearningRate 0.0065 Epoch: 29 Global Step: 169580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:56,529-Speed 10970.47 samples/sec Loss 4.8175 LearningRate 0.0065 Epoch: 29 Global Step: 169590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:57,468-Speed 10918.15 samples/sec Loss 5.0385 LearningRate 0.0065 Epoch: 29 Global Step: 169600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:58,413-Speed 10839.00 samples/sec Loss 5.0595 LearningRate 0.0065 Epoch: 29 Global Step: 169610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:14:59,362-Speed 10804.59 samples/sec Loss 4.9277 LearningRate 0.0065 Epoch: 29 Global Step: 169620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:00,334-Speed 10538.82 samples/sec Loss 5.0014 LearningRate 0.0065 Epoch: 29 Global Step: 169630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:01,324-Speed 10353.79 samples/sec Loss 4.8698 LearningRate 0.0065 Epoch: 29 Global Step: 169640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:02,263-Speed 10910.23 samples/sec Loss 4.9800 LearningRate 0.0065 Epoch: 29 Global Step: 169650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:03,155-Speed 11495.61 samples/sec Loss 5.0249 LearningRate 0.0065 Epoch: 29 Global Step: 169660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:04,088-Speed 10980.90 samples/sec Loss 4.9959 LearningRate 0.0065 Epoch: 29 Global Step: 169670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:05,028-Speed 10900.53 samples/sec Loss 4.9416 LearningRate 0.0064 Epoch: 29 Global Step: 169680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:05,978-Speed 10782.34 samples/sec Loss 4.9467 LearningRate 0.0064 Epoch: 29 Global Step: 169690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:06,911-Speed 10988.94 samples/sec Loss 5.0804 LearningRate 0.0064 Epoch: 29 Global Step: 169700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:07,864-Speed 10756.72 samples/sec Loss 5.1130 LearningRate 0.0064 Epoch: 29 Global Step: 169710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:08,793-Speed 11027.88 samples/sec Loss 5.0646 LearningRate 0.0064 Epoch: 29 Global Step: 169720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:09,710-Speed 11173.22 samples/sec Loss 5.0601 LearningRate 0.0064 Epoch: 29 Global Step: 169730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:10,626-Speed 11182.98 samples/sec Loss 5.0128 LearningRate 0.0064 Epoch: 29 Global Step: 169740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:11,547-Speed 11122.90 samples/sec Loss 4.9516 LearningRate 0.0064 Epoch: 29 Global Step: 169750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:12,508-Speed 10665.72 samples/sec Loss 4.9200 LearningRate 0.0064 Epoch: 29 Global Step: 169760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:15:13,438-Speed 11016.66 samples/sec Loss 4.8947 LearningRate 0.0064 Epoch: 29 Global Step: 169770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:14,377-Speed 10914.23 samples/sec Loss 5.0498 LearningRate 0.0064 Epoch: 29 Global Step: 169780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:15,351-Speed 10522.72 samples/sec Loss 5.0378 LearningRate 0.0064 Epoch: 29 Global Step: 169790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:16,285-Speed 10963.73 samples/sec Loss 4.9648 LearningRate 0.0064 Epoch: 29 Global Step: 169800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:17,220-Speed 10965.75 samples/sec Loss 4.8793 LearningRate 0.0064 Epoch: 29 Global Step: 169810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:18,162-Speed 10881.77 samples/sec Loss 5.1132 LearningRate 0.0064 Epoch: 29 Global Step: 169820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:19,129-Speed 10593.58 samples/sec Loss 4.9481 LearningRate 0.0064 Epoch: 29 Global Step: 169830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:20,097-Speed 10586.60 samples/sec Loss 4.9595 LearningRate 0.0064 Epoch: 29 Global Step: 169840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:20,999-Speed 11361.60 samples/sec Loss 4.9772 LearningRate 0.0064 Epoch: 29 Global Step: 169850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:21,933-Speed 10975.06 samples/sec Loss 5.1180 LearningRate 0.0064 Epoch: 29 Global Step: 169860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:22,857-Speed 11088.00 samples/sec Loss 4.9731 LearningRate 0.0064 Epoch: 29 Global Step: 169870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:15:23,751-Speed 11464.43 samples/sec Loss 4.9348 LearningRate 0.0064 Epoch: 29 Global Step: 169880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:24,667-Speed 11183.54 samples/sec Loss 5.0245 LearningRate 0.0064 Epoch: 29 Global Step: 169890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:25,615-Speed 10806.24 samples/sec Loss 5.0287 LearningRate 0.0064 Epoch: 29 Global Step: 169900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:26,577-Speed 10653.15 samples/sec Loss 5.0262 LearningRate 0.0064 Epoch: 29 Global Step: 169910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:27,523-Speed 10834.84 samples/sec Loss 4.8734 LearningRate 0.0064 Epoch: 29 Global Step: 169920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:28,523-Speed 10244.87 samples/sec Loss 4.9768 LearningRate 0.0064 Epoch: 29 Global Step: 169930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:29,479-Speed 10722.78 samples/sec Loss 5.0380 LearningRate 0.0064 Epoch: 29 Global Step: 169940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:30,449-Speed 10559.90 samples/sec Loss 4.8771 LearningRate 0.0064 Epoch: 29 Global Step: 169950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:31,386-Speed 10942.78 samples/sec Loss 5.0288 LearningRate 0.0064 Epoch: 29 Global Step: 169960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:32,339-Speed 10743.60 samples/sec Loss 5.0239 LearningRate 0.0064 Epoch: 29 Global Step: 169970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:15:33,239-Speed 11389.92 samples/sec Loss 4.9550 LearningRate 0.0064 Epoch: 29 Global Step: 169980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:15:34,169-Speed 11014.43 samples/sec Loss 5.0573 LearningRate 0.0064 Epoch: 29 Global Step: 169990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:15:35,060-Speed 11504.61 samples/sec Loss 4.9963 LearningRate 0.0064 Epoch: 29 Global Step: 170000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:15:56,325-[lfw][170000]XNorm: 8.233515 Training: 2022-04-11 05:15:56,325-[lfw][170000]Accuracy-Flip: 0.99600+-0.00291 Training: 2022-04-11 05:15:56,326-[lfw][170000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:16:21,049-[cfp_fp][170000]XNorm: 6.995797 Training: 2022-04-11 05:16:21,049-[cfp_fp][170000]Accuracy-Flip: 0.95271+-0.00976 Training: 2022-04-11 05:16:21,050-[cfp_fp][170000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:16:42,290-[agedb_30][170000]XNorm: 8.062078 Training: 2022-04-11 05:16:42,291-[agedb_30][170000]Accuracy-Flip: 0.96950+-0.00646 Training: 2022-04-11 05:16:42,291-[agedb_30][170000]Accuracy-Highest: 0.96950 Training: 2022-04-11 05:16:43,240-Speed 150.19 samples/sec Loss 5.0575 LearningRate 0.0064 Epoch: 29 Global Step: 170010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:44,174-Speed 10977.41 samples/sec Loss 4.8852 LearningRate 0.0064 Epoch: 29 Global Step: 170020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:45,089-Speed 11192.58 samples/sec Loss 5.1709 LearningRate 0.0064 Epoch: 29 Global Step: 170030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:45,994-Speed 11326.47 samples/sec Loss 5.1291 LearningRate 0.0064 Epoch: 29 Global Step: 170040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:46,911-Speed 11168.20 samples/sec Loss 4.9491 LearningRate 0.0064 Epoch: 29 Global Step: 170050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:47,837-Speed 11067.46 samples/sec Loss 4.9275 LearningRate 0.0064 Epoch: 29 Global Step: 170060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:48,746-Speed 11278.87 samples/sec Loss 5.0525 LearningRate 0.0064 Epoch: 29 Global Step: 170070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:49,726-Speed 10454.49 samples/sec Loss 4.9962 LearningRate 0.0064 Epoch: 29 Global Step: 170080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:50,626-Speed 11385.61 samples/sec Loss 4.9010 LearningRate 0.0064 Epoch: 29 Global Step: 170090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:51,536-Speed 11256.26 samples/sec Loss 4.9287 LearningRate 0.0064 Epoch: 29 Global Step: 170100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:52,475-Speed 10915.97 samples/sec Loss 5.0545 LearningRate 0.0064 Epoch: 29 Global Step: 170110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:53,429-Speed 10740.73 samples/sec Loss 4.8938 LearningRate 0.0064 Epoch: 29 Global Step: 170120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:16:54,398-Speed 10570.99 samples/sec Loss 4.9560 LearningRate 0.0063 Epoch: 29 Global Step: 170130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:55,350-Speed 10771.59 samples/sec Loss 4.9830 LearningRate 0.0063 Epoch: 29 Global Step: 170140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:56,341-Speed 10334.92 samples/sec Loss 4.9317 LearningRate 0.0063 Epoch: 29 Global Step: 170150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:57,290-Speed 10803.19 samples/sec Loss 4.9164 LearningRate 0.0063 Epoch: 29 Global Step: 170160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:58,220-Speed 11012.31 samples/sec Loss 4.9219 LearningRate 0.0063 Epoch: 29 Global Step: 170170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:16:59,178-Speed 10693.51 samples/sec Loss 5.0146 LearningRate 0.0063 Epoch: 29 Global Step: 170180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:00,113-Speed 10965.68 samples/sec Loss 4.9888 LearningRate 0.0063 Epoch: 29 Global Step: 170190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:01,044-Speed 11004.12 samples/sec Loss 4.9958 LearningRate 0.0063 Epoch: 29 Global Step: 170200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:01,976-Speed 10998.20 samples/sec Loss 5.0140 LearningRate 0.0063 Epoch: 29 Global Step: 170210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:02,899-Speed 11108.09 samples/sec Loss 5.1627 LearningRate 0.0063 Epoch: 29 Global Step: 170220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:03,883-Speed 10406.80 samples/sec Loss 5.0788 LearningRate 0.0063 Epoch: 29 Global Step: 170230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:17:04,810-Speed 11057.84 samples/sec Loss 5.0587 LearningRate 0.0063 Epoch: 29 Global Step: 170240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:05,791-Speed 10451.09 samples/sec Loss 4.9489 LearningRate 0.0063 Epoch: 29 Global Step: 170250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:06,741-Speed 10788.68 samples/sec Loss 4.9314 LearningRate 0.0063 Epoch: 29 Global Step: 170260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:07,660-Speed 11142.02 samples/sec Loss 5.0658 LearningRate 0.0063 Epoch: 29 Global Step: 170270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:08,558-Speed 11413.31 samples/sec Loss 4.9948 LearningRate 0.0063 Epoch: 29 Global Step: 170280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:09,484-Speed 11070.72 samples/sec Loss 5.0472 LearningRate 0.0063 Epoch: 29 Global Step: 170290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:10,402-Speed 11161.55 samples/sec Loss 5.0459 LearningRate 0.0063 Epoch: 29 Global Step: 170300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:11,315-Speed 11216.59 samples/sec Loss 4.9128 LearningRate 0.0063 Epoch: 29 Global Step: 170310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:12,233-Speed 11166.56 samples/sec Loss 5.0316 LearningRate 0.0063 Epoch: 29 Global Step: 170320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:13,154-Speed 11122.67 samples/sec Loss 5.0050 LearningRate 0.0063 Epoch: 29 Global Step: 170330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:14,100-Speed 10834.59 samples/sec Loss 4.8972 LearningRate 0.0063 Epoch: 29 Global Step: 170340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:15,040-Speed 10898.09 samples/sec Loss 5.0445 LearningRate 0.0063 Epoch: 29 Global Step: 170350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:15,994-Speed 10743.19 samples/sec Loss 5.1294 LearningRate 0.0063 Epoch: 29 Global Step: 170360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:16,935-Speed 10891.79 samples/sec Loss 5.0112 LearningRate 0.0063 Epoch: 29 Global Step: 170370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:17,846-Speed 11251.20 samples/sec Loss 5.0129 LearningRate 0.0063 Epoch: 29 Global Step: 170380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:18,775-Speed 11023.28 samples/sec Loss 5.0221 LearningRate 0.0063 Epoch: 29 Global Step: 170390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:19,735-Speed 10673.69 samples/sec Loss 5.0806 LearningRate 0.0063 Epoch: 29 Global Step: 170400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:20,676-Speed 10893.03 samples/sec Loss 4.9995 LearningRate 0.0063 Epoch: 29 Global Step: 170410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:21,573-Speed 11426.89 samples/sec Loss 4.9485 LearningRate 0.0063 Epoch: 29 Global Step: 170420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:22,498-Speed 11069.82 samples/sec Loss 5.0568 LearningRate 0.0063 Epoch: 29 Global Step: 170430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:23,408-Speed 11266.92 samples/sec Loss 4.9574 LearningRate 0.0063 Epoch: 29 Global Step: 170440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:24,324-Speed 11188.97 samples/sec Loss 5.0401 LearningRate 0.0063 Epoch: 29 Global Step: 170450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:25,281-Speed 10844.00 samples/sec Loss 5.1097 LearningRate 0.0063 Epoch: 29 Global Step: 170460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:26,201-Speed 11144.61 samples/sec Loss 4.9553 LearningRate 0.0063 Epoch: 29 Global Step: 170470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:27,133-Speed 10996.07 samples/sec Loss 5.0608 LearningRate 0.0063 Epoch: 29 Global Step: 170480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:28,060-Speed 11046.40 samples/sec Loss 5.0417 LearningRate 0.0063 Epoch: 29 Global Step: 170490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:28,984-Speed 11094.06 samples/sec Loss 5.0284 LearningRate 0.0063 Epoch: 29 Global Step: 170500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:29,941-Speed 10705.68 samples/sec Loss 5.0024 LearningRate 0.0063 Epoch: 29 Global Step: 170510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:30,905-Speed 10634.64 samples/sec Loss 4.9814 LearningRate 0.0063 Epoch: 29 Global Step: 170520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:31,815-Speed 11256.36 samples/sec Loss 4.9054 LearningRate 0.0063 Epoch: 29 Global Step: 170530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:32,741-Speed 11065.62 samples/sec Loss 5.0260 LearningRate 0.0063 Epoch: 29 Global Step: 170540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:33,663-Speed 11121.21 samples/sec Loss 4.8871 LearningRate 0.0063 Epoch: 29 Global Step: 170550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:34,601-Speed 10916.49 samples/sec Loss 5.1496 LearningRate 0.0063 Epoch: 29 Global Step: 170560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:35,609-Speed 10171.88 samples/sec Loss 4.9848 LearningRate 0.0063 Epoch: 29 Global Step: 170570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:36,488-Speed 11660.14 samples/sec Loss 4.9404 LearningRate 0.0063 Epoch: 29 Global Step: 170580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:47,336-Speed 944.00 samples/sec Loss 4.5127 LearningRate 0.0062 Epoch: 30 Global Step: 170590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:48,260-Speed 11090.29 samples/sec Loss 4.4331 LearningRate 0.0062 Epoch: 30 Global Step: 170600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:49,357-Speed 9347.19 samples/sec Loss 4.4067 LearningRate 0.0062 Epoch: 30 Global Step: 170610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:50,267-Speed 11256.61 samples/sec Loss 4.4462 LearningRate 0.0062 Epoch: 30 Global Step: 170620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:51,247-Speed 10455.27 samples/sec Loss 4.5435 LearningRate 0.0062 Epoch: 30 Global Step: 170630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:52,313-Speed 9606.76 samples/sec Loss 4.4512 LearningRate 0.0062 Epoch: 30 Global Step: 170640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:53,236-Speed 11109.44 samples/sec Loss 4.4703 LearningRate 0.0062 Epoch: 30 Global Step: 170650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:17:54,159-Speed 11104.54 samples/sec Loss 4.5615 LearningRate 0.0062 Epoch: 30 Global Step: 170660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:55,106-Speed 10822.97 samples/sec Loss 4.5527 LearningRate 0.0062 Epoch: 30 Global Step: 170670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:56,074-Speed 10586.81 samples/sec Loss 4.3572 LearningRate 0.0062 Epoch: 30 Global Step: 170680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:57,003-Speed 11028.70 samples/sec Loss 4.5117 LearningRate 0.0062 Epoch: 30 Global Step: 170690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:57,914-Speed 11243.36 samples/sec Loss 4.4185 LearningRate 0.0062 Epoch: 30 Global Step: 170700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:58,826-Speed 11238.35 samples/sec Loss 4.4670 LearningRate 0.0062 Epoch: 30 Global Step: 170710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:17:59,741-Speed 11198.10 samples/sec Loss 4.5432 LearningRate 0.0062 Epoch: 30 Global Step: 170720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:00,649-Speed 11302.58 samples/sec Loss 4.5248 LearningRate 0.0062 Epoch: 30 Global Step: 170730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:01,578-Speed 11030.60 samples/sec Loss 4.5139 LearningRate 0.0062 Epoch: 30 Global Step: 170740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:02,476-Speed 11410.94 samples/sec Loss 4.4485 LearningRate 0.0062 Epoch: 30 Global Step: 170750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:03,450-Speed 10521.05 samples/sec Loss 4.5645 LearningRate 0.0062 Epoch: 30 Global Step: 170760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:04,451-Speed 10234.30 samples/sec Loss 4.4968 LearningRate 0.0062 Epoch: 30 Global Step: 170770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:05,479-Speed 9966.09 samples/sec Loss 4.6113 LearningRate 0.0062 Epoch: 30 Global Step: 170780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:06,400-Speed 11126.58 samples/sec Loss 4.4510 LearningRate 0.0062 Epoch: 30 Global Step: 170790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:07,311-Speed 11255.31 samples/sec Loss 4.4264 LearningRate 0.0062 Epoch: 30 Global Step: 170800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:08,214-Speed 11349.84 samples/sec Loss 4.5723 LearningRate 0.0062 Epoch: 30 Global Step: 170810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:09,106-Speed 11477.03 samples/sec Loss 4.5519 LearningRate 0.0062 Epoch: 30 Global Step: 170820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:10,053-Speed 10822.05 samples/sec Loss 4.4956 LearningRate 0.0062 Epoch: 30 Global Step: 170830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:11,001-Speed 10804.57 samples/sec Loss 4.5349 LearningRate 0.0062 Epoch: 30 Global Step: 170840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:11,933-Speed 11006.38 samples/sec Loss 4.5993 LearningRate 0.0062 Epoch: 30 Global Step: 170850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:12,873-Speed 10893.44 samples/sec Loss 4.5674 LearningRate 0.0062 Epoch: 30 Global Step: 170860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:13,808-Speed 10966.38 samples/sec Loss 4.6498 LearningRate 0.0062 Epoch: 30 Global Step: 170870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:14,772-Speed 10625.51 samples/sec Loss 4.5007 LearningRate 0.0062 Epoch: 30 Global Step: 170880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:15,712-Speed 10899.70 samples/sec Loss 4.5779 LearningRate 0.0062 Epoch: 30 Global Step: 170890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:16,620-Speed 11294.55 samples/sec Loss 4.6104 LearningRate 0.0062 Epoch: 30 Global Step: 170900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:17,540-Speed 11137.44 samples/sec Loss 4.5103 LearningRate 0.0062 Epoch: 30 Global Step: 170910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:18,481-Speed 10889.47 samples/sec Loss 4.5499 LearningRate 0.0062 Epoch: 30 Global Step: 170920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:19,406-Speed 11079.10 samples/sec Loss 4.5313 LearningRate 0.0062 Epoch: 30 Global Step: 170930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:20,329-Speed 11101.44 samples/sec Loss 4.6177 LearningRate 0.0062 Epoch: 30 Global Step: 170940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:21,209-Speed 11646.77 samples/sec Loss 4.4205 LearningRate 0.0062 Epoch: 30 Global Step: 170950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:22,104-Speed 11443.11 samples/sec Loss 4.5146 LearningRate 0.0062 Epoch: 30 Global Step: 170960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:23,019-Speed 11202.84 samples/sec Loss 4.5945 LearningRate 0.0062 Epoch: 30 Global Step: 170970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:23,928-Speed 11271.29 samples/sec Loss 4.5748 LearningRate 0.0062 Epoch: 30 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:24,836-Speed 11290.99 samples/sec Loss 4.6508 LearningRate 0.0062 Epoch: 30 Global Step: 170990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:25,737-Speed 11365.24 samples/sec Loss 4.6461 LearningRate 0.0062 Epoch: 30 Global Step: 171000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:26,673-Speed 10949.66 samples/sec Loss 4.5880 LearningRate 0.0062 Epoch: 30 Global Step: 171010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:27,631-Speed 10704.67 samples/sec Loss 4.4536 LearningRate 0.0062 Epoch: 30 Global Step: 171020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:28,555-Speed 11085.27 samples/sec Loss 4.6142 LearningRate 0.0062 Epoch: 30 Global Step: 171030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:29,492-Speed 10934.36 samples/sec Loss 4.4264 LearningRate 0.0061 Epoch: 30 Global Step: 171040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:30,436-Speed 10849.40 samples/sec Loss 4.5368 LearningRate 0.0061 Epoch: 30 Global Step: 171050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:31,356-Speed 11138.80 samples/sec Loss 4.4960 LearningRate 0.0061 Epoch: 30 Global Step: 171060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:32,291-Speed 10964.90 samples/sec Loss 4.5514 LearningRate 0.0061 Epoch: 30 Global Step: 171070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:33,169-Speed 11664.79 samples/sec Loss 4.4554 LearningRate 0.0061 Epoch: 30 Global Step: 171080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:34,107-Speed 10926.30 samples/sec Loss 4.5490 LearningRate 0.0061 Epoch: 30 Global Step: 171090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:35,011-Speed 11339.49 samples/sec Loss 4.6720 LearningRate 0.0061 Epoch: 30 Global Step: 171100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:35,936-Speed 11074.08 samples/sec Loss 4.5480 LearningRate 0.0061 Epoch: 30 Global Step: 171110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:36,869-Speed 10994.05 samples/sec Loss 4.5557 LearningRate 0.0061 Epoch: 30 Global Step: 171120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:37,789-Speed 11137.79 samples/sec Loss 4.6097 LearningRate 0.0061 Epoch: 30 Global Step: 171130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:38,728-Speed 10913.72 samples/sec Loss 4.4623 LearningRate 0.0061 Epoch: 30 Global Step: 171140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:39,651-Speed 11107.92 samples/sec Loss 4.5478 LearningRate 0.0061 Epoch: 30 Global Step: 171150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:40,604-Speed 10750.79 samples/sec Loss 4.5183 LearningRate 0.0061 Epoch: 30 Global Step: 171160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:41,518-Speed 11207.75 samples/sec Loss 4.6234 LearningRate 0.0061 Epoch: 30 Global Step: 171170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:42,428-Speed 11265.98 samples/sec Loss 4.4266 LearningRate 0.0061 Epoch: 30 Global Step: 171180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:43,363-Speed 10957.59 samples/sec Loss 4.5907 LearningRate 0.0061 Epoch: 30 Global Step: 171190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:44,317-Speed 10735.49 samples/sec Loss 4.6366 LearningRate 0.0061 Epoch: 30 Global Step: 171200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:45,211-Speed 11466.52 samples/sec Loss 4.5893 LearningRate 0.0061 Epoch: 30 Global Step: 171210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:46,096-Speed 11581.08 samples/sec Loss 4.6071 LearningRate 0.0061 Epoch: 30 Global Step: 171220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:47,022-Speed 11058.79 samples/sec Loss 4.6833 LearningRate 0.0061 Epoch: 30 Global Step: 171230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:47,943-Speed 11129.17 samples/sec Loss 4.4798 LearningRate 0.0061 Epoch: 30 Global Step: 171240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:48,844-Speed 11368.82 samples/sec Loss 4.5534 LearningRate 0.0061 Epoch: 30 Global Step: 171250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:49,752-Speed 11288.69 samples/sec Loss 4.5609 LearningRate 0.0061 Epoch: 30 Global Step: 171260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:50,683-Speed 11011.59 samples/sec Loss 4.5317 LearningRate 0.0061 Epoch: 30 Global Step: 171270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:51,587-Speed 11328.07 samples/sec Loss 4.6371 LearningRate 0.0061 Epoch: 30 Global Step: 171280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:52,532-Speed 10851.76 samples/sec Loss 4.5731 LearningRate 0.0061 Epoch: 30 Global Step: 171290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:53,488-Speed 10716.08 samples/sec Loss 4.6347 LearningRate 0.0061 Epoch: 30 Global Step: 171300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:54,395-Speed 11291.24 samples/sec Loss 4.7658 LearningRate 0.0061 Epoch: 30 Global Step: 171310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:18:55,299-Speed 11333.56 samples/sec Loss 4.6468 LearningRate 0.0061 Epoch: 30 Global Step: 171320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:56,216-Speed 11175.10 samples/sec Loss 4.6457 LearningRate 0.0061 Epoch: 30 Global Step: 171330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:57,107-Speed 11512.56 samples/sec Loss 4.4922 LearningRate 0.0061 Epoch: 30 Global Step: 171340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:58,050-Speed 10871.53 samples/sec Loss 4.7335 LearningRate 0.0061 Epoch: 30 Global Step: 171350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:58,979-Speed 11024.64 samples/sec Loss 4.5330 LearningRate 0.0061 Epoch: 30 Global Step: 171360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:18:59,893-Speed 11216.21 samples/sec Loss 4.6637 LearningRate 0.0061 Epoch: 30 Global Step: 171370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:00,804-Speed 11247.13 samples/sec Loss 4.7067 LearningRate 0.0061 Epoch: 30 Global Step: 171380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:01,731-Speed 11048.27 samples/sec Loss 4.7310 LearningRate 0.0061 Epoch: 30 Global Step: 171390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:02,676-Speed 10847.36 samples/sec Loss 4.6769 LearningRate 0.0061 Epoch: 30 Global Step: 171400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:03,614-Speed 10920.09 samples/sec Loss 4.6013 LearningRate 0.0061 Epoch: 30 Global Step: 171410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:04,574-Speed 10673.60 samples/sec Loss 4.5827 LearningRate 0.0061 Epoch: 30 Global Step: 171420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:19:05,475-Speed 11384.56 samples/sec Loss 4.5704 LearningRate 0.0061 Epoch: 30 Global Step: 171430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:06,443-Speed 10586.52 samples/sec Loss 4.5237 LearningRate 0.0061 Epoch: 30 Global Step: 171440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:07,363-Speed 11141.38 samples/sec Loss 4.5228 LearningRate 0.0061 Epoch: 30 Global Step: 171450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:08,293-Speed 11010.18 samples/sec Loss 4.6367 LearningRate 0.0061 Epoch: 30 Global Step: 171460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:09,216-Speed 11110.11 samples/sec Loss 4.5101 LearningRate 0.0061 Epoch: 30 Global Step: 171470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:10,099-Speed 11596.87 samples/sec Loss 4.5034 LearningRate 0.0061 Epoch: 30 Global Step: 171480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:11,031-Speed 10992.35 samples/sec Loss 4.7808 LearningRate 0.0061 Epoch: 30 Global Step: 171490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:11,956-Speed 11087.44 samples/sec Loss 4.6118 LearningRate 0.0060 Epoch: 30 Global Step: 171500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:12,874-Speed 11154.28 samples/sec Loss 4.7039 LearningRate 0.0060 Epoch: 30 Global Step: 171510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:13,781-Speed 11303.58 samples/sec Loss 4.6879 LearningRate 0.0060 Epoch: 30 Global Step: 171520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:14,710-Speed 11027.22 samples/sec Loss 4.5226 LearningRate 0.0060 Epoch: 30 Global Step: 171530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:15,663-Speed 10758.60 samples/sec Loss 4.6836 LearningRate 0.0060 Epoch: 30 Global Step: 171540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:16,571-Speed 11281.59 samples/sec Loss 4.5966 LearningRate 0.0060 Epoch: 30 Global Step: 171550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:17,537-Speed 10612.71 samples/sec Loss 4.4626 LearningRate 0.0060 Epoch: 30 Global Step: 171560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:18,469-Speed 10994.94 samples/sec Loss 4.5501 LearningRate 0.0060 Epoch: 30 Global Step: 171570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:19,373-Speed 11328.81 samples/sec Loss 4.6858 LearningRate 0.0060 Epoch: 30 Global Step: 171580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:20,282-Speed 11275.19 samples/sec Loss 4.6293 LearningRate 0.0060 Epoch: 30 Global Step: 171590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:21,182-Speed 11382.49 samples/sec Loss 4.6567 LearningRate 0.0060 Epoch: 30 Global Step: 171600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:22,090-Speed 11299.06 samples/sec Loss 4.6218 LearningRate 0.0060 Epoch: 30 Global Step: 171610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:23,009-Speed 11148.65 samples/sec Loss 4.5856 LearningRate 0.0060 Epoch: 30 Global Step: 171620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:23,908-Speed 11389.30 samples/sec Loss 4.7287 LearningRate 0.0060 Epoch: 30 Global Step: 171630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:24,827-Speed 11149.48 samples/sec Loss 4.6916 LearningRate 0.0060 Epoch: 30 Global Step: 171640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:25,755-Speed 11041.31 samples/sec Loss 4.6369 LearningRate 0.0060 Epoch: 30 Global Step: 171650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:26,697-Speed 10889.81 samples/sec Loss 4.5794 LearningRate 0.0060 Epoch: 30 Global Step: 171660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:27,637-Speed 10899.31 samples/sec Loss 4.6006 LearningRate 0.0060 Epoch: 30 Global Step: 171670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:28,567-Speed 11017.65 samples/sec Loss 4.4692 LearningRate 0.0060 Epoch: 30 Global Step: 171680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:29,480-Speed 11215.46 samples/sec Loss 4.7133 LearningRate 0.0060 Epoch: 30 Global Step: 171690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:30,418-Speed 10928.97 samples/sec Loss 4.5579 LearningRate 0.0060 Epoch: 30 Global Step: 171700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:31,324-Speed 11307.74 samples/sec Loss 4.5642 LearningRate 0.0060 Epoch: 30 Global Step: 171710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:32,276-Speed 10761.53 samples/sec Loss 4.5700 LearningRate 0.0060 Epoch: 30 Global Step: 171720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:33,192-Speed 11194.18 samples/sec Loss 4.7047 LearningRate 0.0060 Epoch: 30 Global Step: 171730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:34,104-Speed 11234.05 samples/sec Loss 4.7316 LearningRate 0.0060 Epoch: 30 Global Step: 171740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:35,030-Speed 11064.37 samples/sec Loss 4.6119 LearningRate 0.0060 Epoch: 30 Global Step: 171750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:35,918-Speed 11538.32 samples/sec Loss 4.7328 LearningRate 0.0060 Epoch: 30 Global Step: 171760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:36,828-Speed 11270.15 samples/sec Loss 4.6198 LearningRate 0.0060 Epoch: 30 Global Step: 171770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:37,780-Speed 10754.64 samples/sec Loss 4.6034 LearningRate 0.0060 Epoch: 30 Global Step: 171780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:38,725-Speed 10846.22 samples/sec Loss 4.5593 LearningRate 0.0060 Epoch: 30 Global Step: 171790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:39,692-Speed 10596.62 samples/sec Loss 4.6639 LearningRate 0.0060 Epoch: 30 Global Step: 171800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:19:40,636-Speed 10855.50 samples/sec Loss 4.7315 LearningRate 0.0060 Epoch: 30 Global Step: 171810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:41,559-Speed 11103.92 samples/sec Loss 4.6558 LearningRate 0.0060 Epoch: 30 Global Step: 171820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:42,494-Speed 10965.39 samples/sec Loss 4.7313 LearningRate 0.0060 Epoch: 30 Global Step: 171830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:43,419-Speed 11077.01 samples/sec Loss 4.6665 LearningRate 0.0060 Epoch: 30 Global Step: 171840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:44,316-Speed 11420.95 samples/sec Loss 4.7719 LearningRate 0.0060 Epoch: 30 Global Step: 171850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:45,207-Speed 11501.78 samples/sec Loss 4.5483 LearningRate 0.0060 Epoch: 30 Global Step: 171860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:46,121-Speed 11206.39 samples/sec Loss 4.6696 LearningRate 0.0060 Epoch: 30 Global Step: 171870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:47,013-Speed 11498.61 samples/sec Loss 4.6549 LearningRate 0.0060 Epoch: 30 Global Step: 171880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:47,939-Speed 11057.65 samples/sec Loss 4.7432 LearningRate 0.0060 Epoch: 30 Global Step: 171890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:48,832-Speed 11475.66 samples/sec Loss 4.6979 LearningRate 0.0060 Epoch: 30 Global Step: 171900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:49,790-Speed 10700.14 samples/sec Loss 4.6671 LearningRate 0.0060 Epoch: 30 Global Step: 171910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:19:50,682-Speed 11483.17 samples/sec Loss 4.7868 LearningRate 0.0060 Epoch: 30 Global Step: 171920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:51,636-Speed 10740.98 samples/sec Loss 4.7217 LearningRate 0.0060 Epoch: 30 Global Step: 171930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:52,550-Speed 11212.53 samples/sec Loss 4.7105 LearningRate 0.0060 Epoch: 30 Global Step: 171940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:53,515-Speed 10615.94 samples/sec Loss 4.6745 LearningRate 0.0060 Epoch: 30 Global Step: 171950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:54,474-Speed 10685.65 samples/sec Loss 4.6391 LearningRate 0.0060 Epoch: 30 Global Step: 171960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:55,436-Speed 10651.33 samples/sec Loss 4.7258 LearningRate 0.0059 Epoch: 30 Global Step: 171970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:56,403-Speed 10595.67 samples/sec Loss 4.6967 LearningRate 0.0059 Epoch: 30 Global Step: 171980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:57,311-Speed 11288.31 samples/sec Loss 4.5900 LearningRate 0.0059 Epoch: 30 Global Step: 171990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:19:58,211-Speed 11396.07 samples/sec Loss 4.6178 LearningRate 0.0059 Epoch: 30 Global Step: 172000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:20:19,555-[lfw][172000]XNorm: 8.332835 Training: 2022-04-11 05:20:19,555-[lfw][172000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-04-11 05:20:19,556-[lfw][172000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:20:44,398-[cfp_fp][172000]XNorm: 7.082748 Training: 2022-04-11 05:20:44,398-[cfp_fp][172000]Accuracy-Flip: 0.95229+-0.01092 Training: 2022-04-11 05:20:44,399-[cfp_fp][172000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:21:05,791-[agedb_30][172000]XNorm: 8.162120 Training: 2022-04-11 05:21:05,792-[agedb_30][172000]Accuracy-Flip: 0.96517+-0.00732 Training: 2022-04-11 05:21:05,792-[agedb_30][172000]Accuracy-Highest: 0.96950 Training: 2022-04-11 05:21:06,715-Speed 149.48 samples/sec Loss 4.6466 LearningRate 0.0059 Epoch: 30 Global Step: 172010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:07,638-Speed 11104.75 samples/sec Loss 4.7485 LearningRate 0.0059 Epoch: 30 Global Step: 172020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:21:08,550-Speed 11235.74 samples/sec Loss 4.6727 LearningRate 0.0059 Epoch: 30 Global Step: 172030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:09,470-Speed 11137.92 samples/sec Loss 4.6489 LearningRate 0.0059 Epoch: 30 Global Step: 172040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:10,380-Speed 11253.59 samples/sec Loss 4.6328 LearningRate 0.0059 Epoch: 30 Global Step: 172050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:11,324-Speed 10849.36 samples/sec Loss 4.7571 LearningRate 0.0059 Epoch: 30 Global Step: 172060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:12,286-Speed 10654.80 samples/sec Loss 4.7507 LearningRate 0.0059 Epoch: 30 Global Step: 172070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:13,208-Speed 11115.37 samples/sec Loss 4.7893 LearningRate 0.0059 Epoch: 30 Global Step: 172080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:14,155-Speed 10824.23 samples/sec Loss 4.8263 LearningRate 0.0059 Epoch: 30 Global Step: 172090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:15,125-Speed 10557.03 samples/sec Loss 4.7560 LearningRate 0.0059 Epoch: 30 Global Step: 172100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:16,078-Speed 10760.26 samples/sec Loss 4.6695 LearningRate 0.0059 Epoch: 30 Global Step: 172110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:17,021-Speed 10861.64 samples/sec Loss 4.7438 LearningRate 0.0059 Epoch: 30 Global Step: 172120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:17,940-Speed 11156.09 samples/sec Loss 4.6584 LearningRate 0.0059 Epoch: 30 Global Step: 172130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:18,898-Speed 10704.64 samples/sec Loss 4.7510 LearningRate 0.0059 Epoch: 30 Global Step: 172140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:19,843-Speed 10836.65 samples/sec Loss 4.6798 LearningRate 0.0059 Epoch: 30 Global Step: 172150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:20,762-Speed 11148.96 samples/sec Loss 4.7389 LearningRate 0.0059 Epoch: 30 Global Step: 172160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:21,649-Speed 11559.50 samples/sec Loss 4.7142 LearningRate 0.0059 Epoch: 30 Global Step: 172170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:22,581-Speed 10992.56 samples/sec Loss 4.7875 LearningRate 0.0059 Epoch: 30 Global Step: 172180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:23,506-Speed 11076.05 samples/sec Loss 4.6899 LearningRate 0.0059 Epoch: 30 Global Step: 172190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:24,460-Speed 10746.75 samples/sec Loss 4.6473 LearningRate 0.0059 Epoch: 30 Global Step: 172200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:25,422-Speed 10645.25 samples/sec Loss 4.6578 LearningRate 0.0059 Epoch: 30 Global Step: 172210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:26,348-Speed 11071.51 samples/sec Loss 4.6366 LearningRate 0.0059 Epoch: 30 Global Step: 172220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:27,242-Speed 11455.27 samples/sec Loss 4.7279 LearningRate 0.0059 Epoch: 30 Global Step: 172230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:28,175-Speed 10990.00 samples/sec Loss 4.7517 LearningRate 0.0059 Epoch: 30 Global Step: 172240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:29,145-Speed 10555.95 samples/sec Loss 4.7130 LearningRate 0.0059 Epoch: 30 Global Step: 172250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:30,128-Speed 10423.63 samples/sec Loss 4.7684 LearningRate 0.0059 Epoch: 30 Global Step: 172260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:31,068-Speed 10901.10 samples/sec Loss 4.6994 LearningRate 0.0059 Epoch: 30 Global Step: 172270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:32,013-Speed 10844.29 samples/sec Loss 4.8037 LearningRate 0.0059 Epoch: 30 Global Step: 172280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:32,912-Speed 11405.31 samples/sec Loss 4.7443 LearningRate 0.0059 Epoch: 30 Global Step: 172290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:33,847-Speed 10952.98 samples/sec Loss 4.6443 LearningRate 0.0059 Epoch: 30 Global Step: 172300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:34,750-Speed 11357.22 samples/sec Loss 4.7357 LearningRate 0.0059 Epoch: 30 Global Step: 172310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:35,653-Speed 11343.64 samples/sec Loss 4.7515 LearningRate 0.0059 Epoch: 30 Global Step: 172320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:36,585-Speed 10993.74 samples/sec Loss 4.7790 LearningRate 0.0059 Epoch: 30 Global Step: 172330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:21:37,510-Speed 11078.77 samples/sec Loss 4.6222 LearningRate 0.0059 Epoch: 30 Global Step: 172340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:21:38,483-Speed 10529.87 samples/sec Loss 4.7955 LearningRate 0.0059 Epoch: 30 Global Step: 172350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:39,448-Speed 10615.00 samples/sec Loss 4.7718 LearningRate 0.0059 Epoch: 30 Global Step: 172360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:40,382-Speed 10987.73 samples/sec Loss 4.7185 LearningRate 0.0059 Epoch: 30 Global Step: 172370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:41,382-Speed 10240.59 samples/sec Loss 4.8568 LearningRate 0.0059 Epoch: 30 Global Step: 172380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:42,357-Speed 10512.36 samples/sec Loss 4.7652 LearningRate 0.0059 Epoch: 30 Global Step: 172390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:43,274-Speed 11176.31 samples/sec Loss 4.8006 LearningRate 0.0059 Epoch: 30 Global Step: 172400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:44,240-Speed 10602.76 samples/sec Loss 4.6842 LearningRate 0.0059 Epoch: 30 Global Step: 172410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:45,156-Speed 11194.06 samples/sec Loss 4.7563 LearningRate 0.0059 Epoch: 30 Global Step: 172420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:46,052-Speed 11437.94 samples/sec Loss 4.6889 LearningRate 0.0058 Epoch: 30 Global Step: 172430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:47,001-Speed 10793.48 samples/sec Loss 4.6908 LearningRate 0.0058 Epoch: 30 Global Step: 172440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:47,916-Speed 11199.61 samples/sec Loss 4.8178 LearningRate 0.0058 Epoch: 30 Global Step: 172450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:48,866-Speed 10792.44 samples/sec Loss 4.6461 LearningRate 0.0058 Epoch: 30 Global Step: 172460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:49,843-Speed 10488.76 samples/sec Loss 4.8064 LearningRate 0.0058 Epoch: 30 Global Step: 172470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:50,768-Speed 11077.10 samples/sec Loss 4.6762 LearningRate 0.0058 Epoch: 30 Global Step: 172480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:51,690-Speed 11117.02 samples/sec Loss 4.6954 LearningRate 0.0058 Epoch: 30 Global Step: 172490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:21:52,629-Speed 10912.28 samples/sec Loss 4.6652 LearningRate 0.0058 Epoch: 30 Global Step: 172500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:53,564-Speed 10959.45 samples/sec Loss 4.7522 LearningRate 0.0058 Epoch: 30 Global Step: 172510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:54,486-Speed 11108.85 samples/sec Loss 4.8610 LearningRate 0.0058 Epoch: 30 Global Step: 172520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:55,418-Speed 10995.08 samples/sec Loss 4.6783 LearningRate 0.0058 Epoch: 30 Global Step: 172530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:56,353-Speed 10957.27 samples/sec Loss 4.6795 LearningRate 0.0058 Epoch: 30 Global Step: 172540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:57,261-Speed 11284.71 samples/sec Loss 4.7979 LearningRate 0.0058 Epoch: 30 Global Step: 172550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:58,179-Speed 11160.70 samples/sec Loss 4.7490 LearningRate 0.0058 Epoch: 30 Global Step: 172560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:21:59,098-Speed 11159.85 samples/sec Loss 4.7386 LearningRate 0.0058 Epoch: 30 Global Step: 172570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:00,001-Speed 11344.22 samples/sec Loss 4.6629 LearningRate 0.0058 Epoch: 30 Global Step: 172580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:00,941-Speed 10908.60 samples/sec Loss 4.6366 LearningRate 0.0058 Epoch: 30 Global Step: 172590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:01,846-Speed 11322.50 samples/sec Loss 4.5894 LearningRate 0.0058 Epoch: 30 Global Step: 172600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:02,809-Speed 10638.41 samples/sec Loss 4.6941 LearningRate 0.0058 Epoch: 30 Global Step: 172610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:03,789-Speed 10450.57 samples/sec Loss 4.7437 LearningRate 0.0058 Epoch: 30 Global Step: 172620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:04,746-Speed 10709.61 samples/sec Loss 4.7478 LearningRate 0.0058 Epoch: 30 Global Step: 172630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:05,662-Speed 11199.46 samples/sec Loss 4.8865 LearningRate 0.0058 Epoch: 30 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:06,594-Speed 10991.89 samples/sec Loss 4.7643 LearningRate 0.0058 Epoch: 30 Global Step: 172650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:07,506-Speed 11234.11 samples/sec Loss 4.7537 LearningRate 0.0058 Epoch: 30 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:08,409-Speed 11340.40 samples/sec Loss 4.6650 LearningRate 0.0058 Epoch: 30 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:09,331-Speed 11119.78 samples/sec Loss 4.7087 LearningRate 0.0058 Epoch: 30 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:10,247-Speed 11193.13 samples/sec Loss 4.7416 LearningRate 0.0058 Epoch: 30 Global Step: 172690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:11,156-Speed 11269.26 samples/sec Loss 4.8432 LearningRate 0.0058 Epoch: 30 Global Step: 172700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:12,080-Speed 11091.78 samples/sec Loss 4.7669 LearningRate 0.0058 Epoch: 30 Global Step: 172710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:12,995-Speed 11190.21 samples/sec Loss 4.7202 LearningRate 0.0058 Epoch: 30 Global Step: 172720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:13,910-Speed 11200.45 samples/sec Loss 4.7495 LearningRate 0.0058 Epoch: 30 Global Step: 172730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:14,881-Speed 10552.92 samples/sec Loss 4.7268 LearningRate 0.0058 Epoch: 30 Global Step: 172740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:15,827-Speed 10828.80 samples/sec Loss 4.9193 LearningRate 0.0058 Epoch: 30 Global Step: 172750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:16,744-Speed 11183.38 samples/sec Loss 4.7333 LearningRate 0.0058 Epoch: 30 Global Step: 172760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:17,670-Speed 11057.35 samples/sec Loss 4.7287 LearningRate 0.0058 Epoch: 30 Global Step: 172770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:18,576-Speed 11313.07 samples/sec Loss 4.7966 LearningRate 0.0058 Epoch: 30 Global Step: 172780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:19,507-Speed 11007.01 samples/sec Loss 4.7205 LearningRate 0.0058 Epoch: 30 Global Step: 172790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:20,401-Speed 11462.44 samples/sec Loss 4.6188 LearningRate 0.0058 Epoch: 30 Global Step: 172800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:21,296-Speed 11448.58 samples/sec Loss 4.8622 LearningRate 0.0058 Epoch: 30 Global Step: 172810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:22,212-Speed 11196.18 samples/sec Loss 4.7689 LearningRate 0.0058 Epoch: 30 Global Step: 172820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:23,131-Speed 11142.19 samples/sec Loss 4.6611 LearningRate 0.0058 Epoch: 30 Global Step: 172830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:24,062-Speed 11008.51 samples/sec Loss 4.6384 LearningRate 0.0058 Epoch: 30 Global Step: 172840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:24,977-Speed 11209.94 samples/sec Loss 4.8619 LearningRate 0.0058 Epoch: 30 Global Step: 172850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:25,954-Speed 10479.66 samples/sec Loss 4.7191 LearningRate 0.0058 Epoch: 30 Global Step: 172860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:26,913-Speed 10688.45 samples/sec Loss 4.6646 LearningRate 0.0058 Epoch: 30 Global Step: 172870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:27,856-Speed 10862.69 samples/sec Loss 4.8853 LearningRate 0.0058 Epoch: 30 Global Step: 172880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:28,773-Speed 11172.49 samples/sec Loss 4.6576 LearningRate 0.0058 Epoch: 30 Global Step: 172890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:29,712-Speed 10917.04 samples/sec Loss 4.7430 LearningRate 0.0058 Epoch: 30 Global Step: 172900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:30,602-Speed 11519.92 samples/sec Loss 4.7112 LearningRate 0.0057 Epoch: 30 Global Step: 172910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:31,504-Speed 11358.22 samples/sec Loss 4.7569 LearningRate 0.0057 Epoch: 30 Global Step: 172920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:32,435-Speed 11009.92 samples/sec Loss 4.8212 LearningRate 0.0057 Epoch: 30 Global Step: 172930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:33,346-Speed 11237.28 samples/sec Loss 4.7085 LearningRate 0.0057 Epoch: 30 Global Step: 172940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:34,255-Speed 11277.14 samples/sec Loss 4.8400 LearningRate 0.0057 Epoch: 30 Global Step: 172950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:35,174-Speed 11152.27 samples/sec Loss 4.7400 LearningRate 0.0057 Epoch: 30 Global Step: 172960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:36,073-Speed 11394.65 samples/sec Loss 4.6232 LearningRate 0.0057 Epoch: 30 Global Step: 172970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:37,010-Speed 10944.72 samples/sec Loss 4.7736 LearningRate 0.0057 Epoch: 30 Global Step: 172980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:37,960-Speed 10781.20 samples/sec Loss 4.7268 LearningRate 0.0057 Epoch: 30 Global Step: 172990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:38,932-Speed 10544.88 samples/sec Loss 4.8857 LearningRate 0.0057 Epoch: 30 Global Step: 173000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:39,905-Speed 10523.05 samples/sec Loss 4.7965 LearningRate 0.0057 Epoch: 30 Global Step: 173010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:40,840-Speed 10964.61 samples/sec Loss 4.8185 LearningRate 0.0057 Epoch: 30 Global Step: 173020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:41,781-Speed 10894.04 samples/sec Loss 4.6997 LearningRate 0.0057 Epoch: 30 Global Step: 173030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:42,719-Speed 10918.82 samples/sec Loss 4.6189 LearningRate 0.0057 Epoch: 30 Global Step: 173040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:43,635-Speed 11187.20 samples/sec Loss 4.7275 LearningRate 0.0057 Epoch: 30 Global Step: 173050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:22:44,529-Speed 11464.59 samples/sec Loss 4.5613 LearningRate 0.0057 Epoch: 30 Global Step: 173060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:45,433-Speed 11336.19 samples/sec Loss 4.7775 LearningRate 0.0057 Epoch: 30 Global Step: 173070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:46,338-Speed 11330.03 samples/sec Loss 4.8008 LearningRate 0.0057 Epoch: 30 Global Step: 173080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:47,252-Speed 11203.47 samples/sec Loss 4.6753 LearningRate 0.0057 Epoch: 30 Global Step: 173090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:48,194-Speed 10880.06 samples/sec Loss 4.9015 LearningRate 0.0057 Epoch: 30 Global Step: 173100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:49,180-Speed 10394.41 samples/sec Loss 4.7715 LearningRate 0.0057 Epoch: 30 Global Step: 173110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:50,135-Speed 10735.14 samples/sec Loss 4.7616 LearningRate 0.0057 Epoch: 30 Global Step: 173120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:51,080-Speed 10844.58 samples/sec Loss 4.6970 LearningRate 0.0057 Epoch: 30 Global Step: 173130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:51,995-Speed 11194.07 samples/sec Loss 4.7649 LearningRate 0.0057 Epoch: 30 Global Step: 173140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:52,899-Speed 11340.27 samples/sec Loss 4.8506 LearningRate 0.0057 Epoch: 30 Global Step: 173150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:53,827-Speed 11037.48 samples/sec Loss 4.7398 LearningRate 0.0057 Epoch: 30 Global Step: 173160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:54,750-Speed 11100.35 samples/sec Loss 4.6303 LearningRate 0.0057 Epoch: 30 Global Step: 173170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:55,695-Speed 10844.16 samples/sec Loss 4.8627 LearningRate 0.0057 Epoch: 30 Global Step: 173180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:22:56,650-Speed 10728.71 samples/sec Loss 4.8233 LearningRate 0.0057 Epoch: 30 Global Step: 173190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:57,562-Speed 11235.83 samples/sec Loss 4.7859 LearningRate 0.0057 Epoch: 30 Global Step: 173200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:58,492-Speed 11023.95 samples/sec Loss 4.7409 LearningRate 0.0057 Epoch: 30 Global Step: 173210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:22:59,459-Speed 10733.87 samples/sec Loss 4.6907 LearningRate 0.0057 Epoch: 30 Global Step: 173220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:00,431-Speed 10547.33 samples/sec Loss 4.7536 LearningRate 0.0057 Epoch: 30 Global Step: 173230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:01,388-Speed 10705.81 samples/sec Loss 4.8149 LearningRate 0.0057 Epoch: 30 Global Step: 173240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:02,285-Speed 11416.42 samples/sec Loss 4.7342 LearningRate 0.0057 Epoch: 30 Global Step: 173250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:03,196-Speed 11246.14 samples/sec Loss 4.6549 LearningRate 0.0057 Epoch: 30 Global Step: 173260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:04,170-Speed 10523.89 samples/sec Loss 4.7188 LearningRate 0.0057 Epoch: 30 Global Step: 173270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:05,146-Speed 10496.42 samples/sec Loss 4.8129 LearningRate 0.0057 Epoch: 30 Global Step: 173280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:06,091-Speed 10843.42 samples/sec Loss 4.6745 LearningRate 0.0057 Epoch: 30 Global Step: 173290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:07,079-Speed 10375.25 samples/sec Loss 4.6902 LearningRate 0.0057 Epoch: 30 Global Step: 173300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:07,988-Speed 11275.73 samples/sec Loss 4.6894 LearningRate 0.0057 Epoch: 30 Global Step: 173310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:08,924-Speed 10943.69 samples/sec Loss 4.7212 LearningRate 0.0057 Epoch: 30 Global Step: 173320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:09,829-Speed 11323.99 samples/sec Loss 4.6119 LearningRate 0.0057 Epoch: 30 Global Step: 173330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:10,737-Speed 11294.13 samples/sec Loss 4.8779 LearningRate 0.0057 Epoch: 30 Global Step: 173340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:11,648-Speed 11250.27 samples/sec Loss 4.8420 LearningRate 0.0057 Epoch: 30 Global Step: 173350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:12,557-Speed 11265.96 samples/sec Loss 4.6678 LearningRate 0.0057 Epoch: 30 Global Step: 173360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:13,525-Speed 10581.70 samples/sec Loss 4.8821 LearningRate 0.0057 Epoch: 30 Global Step: 173370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:14,474-Speed 10800.17 samples/sec Loss 4.7628 LearningRate 0.0056 Epoch: 30 Global Step: 173380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:15,362-Speed 11538.00 samples/sec Loss 4.7609 LearningRate 0.0056 Epoch: 30 Global Step: 173390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:16,257-Speed 11458.87 samples/sec Loss 4.7892 LearningRate 0.0056 Epoch: 30 Global Step: 173400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:17,198-Speed 10888.12 samples/sec Loss 4.8347 LearningRate 0.0056 Epoch: 30 Global Step: 173410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:18,146-Speed 10808.74 samples/sec Loss 4.8607 LearningRate 0.0056 Epoch: 30 Global Step: 173420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:19,086-Speed 10897.93 samples/sec Loss 4.7226 LearningRate 0.0056 Epoch: 30 Global Step: 173430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:19,986-Speed 11381.95 samples/sec Loss 4.7343 LearningRate 0.0056 Epoch: 30 Global Step: 173440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:20,907-Speed 11127.45 samples/sec Loss 4.8569 LearningRate 0.0056 Epoch: 30 Global Step: 173450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:21,814-Speed 11299.89 samples/sec Loss 4.6545 LearningRate 0.0056 Epoch: 30 Global Step: 173460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:22,735-Speed 11123.63 samples/sec Loss 4.8785 LearningRate 0.0056 Epoch: 30 Global Step: 173470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:23,645-Speed 11259.06 samples/sec Loss 4.7304 LearningRate 0.0056 Epoch: 30 Global Step: 173480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:24,566-Speed 11132.87 samples/sec Loss 4.7813 LearningRate 0.0056 Epoch: 30 Global Step: 173490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:25,473-Speed 11299.85 samples/sec Loss 4.8512 LearningRate 0.0056 Epoch: 30 Global Step: 173500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:26,404-Speed 10999.90 samples/sec Loss 4.6241 LearningRate 0.0056 Epoch: 30 Global Step: 173510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:27,346-Speed 10882.45 samples/sec Loss 4.7523 LearningRate 0.0056 Epoch: 30 Global Step: 173520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:28,303-Speed 10709.05 samples/sec Loss 4.7791 LearningRate 0.0056 Epoch: 30 Global Step: 173530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:29,259-Speed 10722.79 samples/sec Loss 4.7225 LearningRate 0.0056 Epoch: 30 Global Step: 173540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:30,175-Speed 11183.93 samples/sec Loss 4.7619 LearningRate 0.0056 Epoch: 30 Global Step: 173550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:31,125-Speed 10787.53 samples/sec Loss 4.7409 LearningRate 0.0056 Epoch: 30 Global Step: 173560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:32,074-Speed 10798.88 samples/sec Loss 4.6782 LearningRate 0.0056 Epoch: 30 Global Step: 173570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:32,967-Speed 11465.11 samples/sec Loss 4.6877 LearningRate 0.0056 Epoch: 30 Global Step: 173580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:33,874-Speed 11305.29 samples/sec Loss 4.8345 LearningRate 0.0056 Epoch: 30 Global Step: 173590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:23:34,800-Speed 11058.99 samples/sec Loss 4.8313 LearningRate 0.0056 Epoch: 30 Global Step: 173600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:35,714-Speed 11222.44 samples/sec Loss 4.9264 LearningRate 0.0056 Epoch: 30 Global Step: 173610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:36,616-Speed 11359.24 samples/sec Loss 4.8771 LearningRate 0.0056 Epoch: 30 Global Step: 173620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:37,537-Speed 11117.61 samples/sec Loss 4.8992 LearningRate 0.0056 Epoch: 30 Global Step: 173630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:38,492-Speed 10731.20 samples/sec Loss 4.7798 LearningRate 0.0056 Epoch: 30 Global Step: 173640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:39,410-Speed 11168.48 samples/sec Loss 4.9693 LearningRate 0.0056 Epoch: 30 Global Step: 173650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:40,322-Speed 11226.89 samples/sec Loss 4.8224 LearningRate 0.0056 Epoch: 30 Global Step: 173660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:41,250-Speed 11043.70 samples/sec Loss 4.6469 LearningRate 0.0056 Epoch: 30 Global Step: 173670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:42,167-Speed 11175.94 samples/sec Loss 4.7470 LearningRate 0.0056 Epoch: 30 Global Step: 173680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:43,084-Speed 11172.01 samples/sec Loss 4.8396 LearningRate 0.0056 Epoch: 30 Global Step: 173690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:44,008-Speed 11088.92 samples/sec Loss 4.8422 LearningRate 0.0056 Epoch: 30 Global Step: 173700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:44,901-Speed 11475.53 samples/sec Loss 4.8101 LearningRate 0.0056 Epoch: 30 Global Step: 173710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:45,808-Speed 11297.12 samples/sec Loss 4.7853 LearningRate 0.0056 Epoch: 30 Global Step: 173720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:46,712-Speed 11338.05 samples/sec Loss 4.7407 LearningRate 0.0056 Epoch: 30 Global Step: 173730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:47,636-Speed 11093.98 samples/sec Loss 4.8780 LearningRate 0.0056 Epoch: 30 Global Step: 173740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:48,533-Speed 11422.58 samples/sec Loss 4.7269 LearningRate 0.0056 Epoch: 30 Global Step: 173750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:49,469-Speed 10943.33 samples/sec Loss 4.8617 LearningRate 0.0056 Epoch: 30 Global Step: 173760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:50,444-Speed 10518.85 samples/sec Loss 4.7577 LearningRate 0.0056 Epoch: 30 Global Step: 173770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:51,389-Speed 10840.18 samples/sec Loss 4.7413 LearningRate 0.0056 Epoch: 30 Global Step: 173780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:52,377-Speed 10377.60 samples/sec Loss 4.7713 LearningRate 0.0056 Epoch: 30 Global Step: 173790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:53,341-Speed 10629.28 samples/sec Loss 4.8644 LearningRate 0.0056 Epoch: 30 Global Step: 173800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:23:54,315-Speed 10516.70 samples/sec Loss 4.7907 LearningRate 0.0056 Epoch: 30 Global Step: 173810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:55,276-Speed 10661.69 samples/sec Loss 4.8671 LearningRate 0.0056 Epoch: 30 Global Step: 173820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:56,200-Speed 11088.73 samples/sec Loss 4.7336 LearningRate 0.0056 Epoch: 30 Global Step: 173830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:57,123-Speed 11103.23 samples/sec Loss 4.7383 LearningRate 0.0056 Epoch: 30 Global Step: 173840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:58,044-Speed 11120.97 samples/sec Loss 4.8268 LearningRate 0.0056 Epoch: 30 Global Step: 173850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:58,978-Speed 10975.10 samples/sec Loss 4.9306 LearningRate 0.0055 Epoch: 30 Global Step: 173860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:23:59,919-Speed 10882.89 samples/sec Loss 4.7936 LearningRate 0.0055 Epoch: 30 Global Step: 173870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:00,887-Speed 10593.07 samples/sec Loss 4.7529 LearningRate 0.0055 Epoch: 30 Global Step: 173880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:01,824-Speed 10934.10 samples/sec Loss 4.7258 LearningRate 0.0055 Epoch: 30 Global Step: 173890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:02,802-Speed 10475.65 samples/sec Loss 4.8187 LearningRate 0.0055 Epoch: 30 Global Step: 173900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:03,782-Speed 10457.93 samples/sec Loss 4.7905 LearningRate 0.0055 Epoch: 30 Global Step: 173910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:04,745-Speed 10643.81 samples/sec Loss 4.8339 LearningRate 0.0055 Epoch: 30 Global Step: 173920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:05,644-Speed 11398.36 samples/sec Loss 4.8643 LearningRate 0.0055 Epoch: 30 Global Step: 173930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:06,575-Speed 10999.15 samples/sec Loss 4.7660 LearningRate 0.0055 Epoch: 30 Global Step: 173940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:07,513-Speed 10925.90 samples/sec Loss 4.8082 LearningRate 0.0055 Epoch: 30 Global Step: 173950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:08,471-Speed 10696.26 samples/sec Loss 4.8776 LearningRate 0.0055 Epoch: 30 Global Step: 173960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:24:09,380-Speed 11272.47 samples/sec Loss 4.7793 LearningRate 0.0055 Epoch: 30 Global Step: 173970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:24:10,275-Speed 11454.13 samples/sec Loss 4.7257 LearningRate 0.0055 Epoch: 30 Global Step: 173980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:24:11,237-Speed 10645.31 samples/sec Loss 4.8396 LearningRate 0.0055 Epoch: 30 Global Step: 173990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:24:12,158-Speed 11128.27 samples/sec Loss 4.7582 LearningRate 0.0055 Epoch: 30 Global Step: 174000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:24:33,442-[lfw][174000]XNorm: 8.076404 Training: 2022-04-11 05:24:33,443-[lfw][174000]Accuracy-Flip: 0.99733+-0.00260 Training: 2022-04-11 05:24:33,443-[lfw][174000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:24:58,052-[cfp_fp][174000]XNorm: 6.869202 Training: 2022-04-11 05:24:58,053-[cfp_fp][174000]Accuracy-Flip: 0.95314+-0.01119 Training: 2022-04-11 05:24:58,053-[cfp_fp][174000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:25:19,267-[agedb_30][174000]XNorm: 7.896394 Training: 2022-04-11 05:25:19,267-[agedb_30][174000]Accuracy-Flip: 0.96967+-0.00781 Training: 2022-04-11 05:25:19,268-[agedb_30][174000]Accuracy-Highest: 0.96967 Training: 2022-04-11 05:25:20,223-Speed 150.45 samples/sec Loss 4.8720 LearningRate 0.0055 Epoch: 30 Global Step: 174010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:21,132-Speed 11269.89 samples/sec Loss 4.8891 LearningRate 0.0055 Epoch: 30 Global Step: 174020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:22,038-Speed 11304.50 samples/sec Loss 4.8829 LearningRate 0.0055 Epoch: 30 Global Step: 174030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:22,950-Speed 11243.73 samples/sec Loss 4.7444 LearningRate 0.0055 Epoch: 30 Global Step: 174040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:23,861-Speed 11251.73 samples/sec Loss 4.7428 LearningRate 0.0055 Epoch: 30 Global Step: 174050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:24,779-Speed 11152.88 samples/sec Loss 4.8004 LearningRate 0.0055 Epoch: 30 Global Step: 174060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:25,711-Speed 11002.01 samples/sec Loss 4.5948 LearningRate 0.0055 Epoch: 30 Global Step: 174070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:26,700-Speed 10356.02 samples/sec Loss 4.8123 LearningRate 0.0055 Epoch: 30 Global Step: 174080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:27,669-Speed 10572.11 samples/sec Loss 4.7308 LearningRate 0.0055 Epoch: 30 Global Step: 174090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:28,675-Speed 10195.20 samples/sec Loss 4.8169 LearningRate 0.0055 Epoch: 30 Global Step: 174100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:29,651-Speed 10490.73 samples/sec Loss 4.8274 LearningRate 0.0055 Epoch: 30 Global Step: 174110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:30,636-Speed 10406.09 samples/sec Loss 4.7332 LearningRate 0.0055 Epoch: 30 Global Step: 174120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:31,578-Speed 10879.51 samples/sec Loss 4.9439 LearningRate 0.0055 Epoch: 30 Global Step: 174130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:32,542-Speed 10625.12 samples/sec Loss 4.7072 LearningRate 0.0055 Epoch: 30 Global Step: 174140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:33,447-Speed 11330.62 samples/sec Loss 4.7644 LearningRate 0.0055 Epoch: 30 Global Step: 174150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:34,366-Speed 11139.55 samples/sec Loss 4.8803 LearningRate 0.0055 Epoch: 30 Global Step: 174160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:35,266-Speed 11391.92 samples/sec Loss 4.7731 LearningRate 0.0055 Epoch: 30 Global Step: 174170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:36,208-Speed 10871.43 samples/sec Loss 4.8334 LearningRate 0.0055 Epoch: 30 Global Step: 174180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:37,154-Speed 10831.36 samples/sec Loss 4.8253 LearningRate 0.0055 Epoch: 30 Global Step: 174190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:38,098-Speed 10863.48 samples/sec Loss 4.7655 LearningRate 0.0055 Epoch: 30 Global Step: 174200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:39,017-Speed 11143.77 samples/sec Loss 4.7500 LearningRate 0.0055 Epoch: 30 Global Step: 174210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:39,965-Speed 10810.27 samples/sec Loss 4.8311 LearningRate 0.0055 Epoch: 30 Global Step: 174220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:40,916-Speed 10777.15 samples/sec Loss 4.7047 LearningRate 0.0055 Epoch: 30 Global Step: 174230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:41,850-Speed 10972.31 samples/sec Loss 4.8445 LearningRate 0.0055 Epoch: 30 Global Step: 174240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:42,807-Speed 10706.23 samples/sec Loss 4.7870 LearningRate 0.0055 Epoch: 30 Global Step: 174250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:43,758-Speed 10771.57 samples/sec Loss 4.7915 LearningRate 0.0055 Epoch: 30 Global Step: 174260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:44,679-Speed 11131.63 samples/sec Loss 4.9053 LearningRate 0.0055 Epoch: 30 Global Step: 174270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:45,577-Speed 11409.02 samples/sec Loss 4.8938 LearningRate 0.0055 Epoch: 30 Global Step: 174280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:46,497-Speed 11130.51 samples/sec Loss 4.8457 LearningRate 0.0055 Epoch: 30 Global Step: 174290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:47,432-Speed 10967.25 samples/sec Loss 4.7410 LearningRate 0.0055 Epoch: 30 Global Step: 174300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:48,387-Speed 10724.08 samples/sec Loss 4.8524 LearningRate 0.0055 Epoch: 30 Global Step: 174310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:49,339-Speed 10773.17 samples/sec Loss 4.6776 LearningRate 0.0055 Epoch: 30 Global Step: 174320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:50,284-Speed 10841.13 samples/sec Loss 4.8979 LearningRate 0.0055 Epoch: 30 Global Step: 174330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:51,204-Speed 11132.55 samples/sec Loss 4.8163 LearningRate 0.0055 Epoch: 30 Global Step: 174340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:52,134-Speed 11027.82 samples/sec Loss 4.7193 LearningRate 0.0054 Epoch: 30 Global Step: 174350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:53,073-Speed 10905.85 samples/sec Loss 4.7938 LearningRate 0.0054 Epoch: 30 Global Step: 174360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:25:53,994-Speed 11123.65 samples/sec Loss 4.7024 LearningRate 0.0054 Epoch: 30 Global Step: 174370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:54,933-Speed 10921.00 samples/sec Loss 4.9181 LearningRate 0.0054 Epoch: 30 Global Step: 174380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:55,887-Speed 10736.19 samples/sec Loss 4.9386 LearningRate 0.0054 Epoch: 30 Global Step: 174390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:56,848-Speed 10658.81 samples/sec Loss 4.8263 LearningRate 0.0054 Epoch: 30 Global Step: 174400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:57,762-Speed 11215.16 samples/sec Loss 4.7469 LearningRate 0.0054 Epoch: 30 Global Step: 174410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:58,665-Speed 11356.53 samples/sec Loss 4.7428 LearningRate 0.0054 Epoch: 30 Global Step: 174420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:25:59,617-Speed 10762.00 samples/sec Loss 4.9010 LearningRate 0.0054 Epoch: 30 Global Step: 174430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:00,523-Speed 11301.65 samples/sec Loss 4.7766 LearningRate 0.0054 Epoch: 30 Global Step: 174440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:01,439-Speed 11185.56 samples/sec Loss 4.8086 LearningRate 0.0054 Epoch: 30 Global Step: 174450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:02,366-Speed 11060.79 samples/sec Loss 4.8074 LearningRate 0.0054 Epoch: 30 Global Step: 174460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:03,302-Speed 10949.83 samples/sec Loss 5.0149 LearningRate 0.0054 Epoch: 30 Global Step: 174470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:04,239-Speed 10933.15 samples/sec Loss 4.8972 LearningRate 0.0054 Epoch: 30 Global Step: 174480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:05,231-Speed 10325.88 samples/sec Loss 4.8221 LearningRate 0.0054 Epoch: 30 Global Step: 174490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:06,193-Speed 10649.23 samples/sec Loss 5.0412 LearningRate 0.0054 Epoch: 30 Global Step: 174500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:07,141-Speed 10811.10 samples/sec Loss 4.9047 LearningRate 0.0054 Epoch: 30 Global Step: 174510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:08,086-Speed 10845.78 samples/sec Loss 4.7964 LearningRate 0.0054 Epoch: 30 Global Step: 174520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:08,996-Speed 11268.80 samples/sec Loss 4.8796 LearningRate 0.0054 Epoch: 30 Global Step: 174530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:09,892-Speed 11435.41 samples/sec Loss 4.7715 LearningRate 0.0054 Epoch: 30 Global Step: 174540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:10,795-Speed 11348.71 samples/sec Loss 4.7229 LearningRate 0.0054 Epoch: 30 Global Step: 174550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:11,691-Speed 11437.23 samples/sec Loss 4.8114 LearningRate 0.0054 Epoch: 30 Global Step: 174560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:12,624-Speed 10978.37 samples/sec Loss 4.5878 LearningRate 0.0054 Epoch: 30 Global Step: 174570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:13,585-Speed 10663.31 samples/sec Loss 4.7831 LearningRate 0.0054 Epoch: 30 Global Step: 174580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:14,517-Speed 10996.86 samples/sec Loss 4.8647 LearningRate 0.0054 Epoch: 30 Global Step: 174590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:15,432-Speed 11198.36 samples/sec Loss 4.9249 LearningRate 0.0054 Epoch: 30 Global Step: 174600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:16,396-Speed 10630.56 samples/sec Loss 4.7852 LearningRate 0.0054 Epoch: 30 Global Step: 174610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:17,352-Speed 10714.38 samples/sec Loss 4.6825 LearningRate 0.0054 Epoch: 30 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:18,297-Speed 10848.81 samples/sec Loss 4.9505 LearningRate 0.0054 Epoch: 30 Global Step: 174630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:19,240-Speed 10863.52 samples/sec Loss 4.8195 LearningRate 0.0054 Epoch: 30 Global Step: 174640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:20,169-Speed 11034.57 samples/sec Loss 4.8010 LearningRate 0.0054 Epoch: 30 Global Step: 174650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:21,066-Speed 11419.66 samples/sec Loss 4.7836 LearningRate 0.0054 Epoch: 30 Global Step: 174660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:21,971-Speed 11322.98 samples/sec Loss 4.8650 LearningRate 0.0054 Epoch: 30 Global Step: 174670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:22,890-Speed 11152.27 samples/sec Loss 4.7486 LearningRate 0.0054 Epoch: 30 Global Step: 174680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:23,798-Speed 11286.82 samples/sec Loss 4.8239 LearningRate 0.0054 Epoch: 30 Global Step: 174690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:24,722-Speed 11093.76 samples/sec Loss 4.7434 LearningRate 0.0054 Epoch: 30 Global Step: 174700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:25,640-Speed 11159.07 samples/sec Loss 4.9284 LearningRate 0.0054 Epoch: 30 Global Step: 174710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:26,560-Speed 11135.64 samples/sec Loss 4.8239 LearningRate 0.0054 Epoch: 30 Global Step: 174720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:27,475-Speed 11207.56 samples/sec Loss 4.8798 LearningRate 0.0054 Epoch: 30 Global Step: 174730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:28,405-Speed 11011.39 samples/sec Loss 4.7619 LearningRate 0.0054 Epoch: 30 Global Step: 174740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:26:29,343-Speed 10929.39 samples/sec Loss 4.8287 LearningRate 0.0054 Epoch: 30 Global Step: 174750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:30,346-Speed 10218.53 samples/sec Loss 4.8133 LearningRate 0.0054 Epoch: 30 Global Step: 174760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:31,324-Speed 10474.68 samples/sec Loss 4.9461 LearningRate 0.0054 Epoch: 30 Global Step: 174770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:32,254-Speed 11020.62 samples/sec Loss 4.8964 LearningRate 0.0054 Epoch: 30 Global Step: 174780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:33,155-Speed 11373.04 samples/sec Loss 4.8622 LearningRate 0.0054 Epoch: 30 Global Step: 174790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:34,080-Speed 11071.87 samples/sec Loss 4.8356 LearningRate 0.0054 Epoch: 30 Global Step: 174800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:34,986-Speed 11313.84 samples/sec Loss 4.8268 LearningRate 0.0054 Epoch: 30 Global Step: 174810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:35,905-Speed 11154.57 samples/sec Loss 4.8381 LearningRate 0.0054 Epoch: 30 Global Step: 174820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:36,856-Speed 10768.13 samples/sec Loss 4.7329 LearningRate 0.0054 Epoch: 30 Global Step: 174830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:37,828-Speed 10543.87 samples/sec Loss 4.7219 LearningRate 0.0053 Epoch: 30 Global Step: 174840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:38,815-Speed 10388.01 samples/sec Loss 4.7708 LearningRate 0.0053 Epoch: 30 Global Step: 174850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:39,792-Speed 10484.61 samples/sec Loss 4.9035 LearningRate 0.0053 Epoch: 30 Global Step: 174860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:40,746-Speed 10740.43 samples/sec Loss 4.9207 LearningRate 0.0053 Epoch: 30 Global Step: 174870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:41,705-Speed 10682.26 samples/sec Loss 4.7782 LearningRate 0.0053 Epoch: 30 Global Step: 174880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:42,703-Speed 10269.33 samples/sec Loss 4.9017 LearningRate 0.0053 Epoch: 30 Global Step: 174890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:43,623-Speed 11135.02 samples/sec Loss 4.7215 LearningRate 0.0053 Epoch: 30 Global Step: 174900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:44,573-Speed 10792.58 samples/sec Loss 4.9772 LearningRate 0.0053 Epoch: 30 Global Step: 174910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:45,472-Speed 11389.67 samples/sec Loss 4.9257 LearningRate 0.0053 Epoch: 30 Global Step: 174920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:46,391-Speed 11161.46 samples/sec Loss 4.7463 LearningRate 0.0053 Epoch: 30 Global Step: 174930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:47,318-Speed 11046.52 samples/sec Loss 4.7083 LearningRate 0.0053 Epoch: 30 Global Step: 174940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:48,258-Speed 10904.56 samples/sec Loss 4.9410 LearningRate 0.0053 Epoch: 30 Global Step: 174950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:49,216-Speed 10703.01 samples/sec Loss 4.8147 LearningRate 0.0053 Epoch: 30 Global Step: 174960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:50,158-Speed 10877.97 samples/sec Loss 4.7898 LearningRate 0.0053 Epoch: 30 Global Step: 174970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:51,075-Speed 11166.37 samples/sec Loss 4.8312 LearningRate 0.0053 Epoch: 30 Global Step: 174980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:52,030-Speed 10735.95 samples/sec Loss 4.9250 LearningRate 0.0053 Epoch: 30 Global Step: 174990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:52,985-Speed 10730.10 samples/sec Loss 4.9119 LearningRate 0.0053 Epoch: 30 Global Step: 175000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:53,955-Speed 10555.95 samples/sec Loss 4.7661 LearningRate 0.0053 Epoch: 30 Global Step: 175010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:54,894-Speed 10911.32 samples/sec Loss 4.8047 LearningRate 0.0053 Epoch: 30 Global Step: 175020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:26:55,870-Speed 10503.62 samples/sec Loss 4.8480 LearningRate 0.0053 Epoch: 30 Global Step: 175030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:56,791-Speed 11128.45 samples/sec Loss 4.8442 LearningRate 0.0053 Epoch: 30 Global Step: 175040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:57,700-Speed 11277.07 samples/sec Loss 4.8158 LearningRate 0.0053 Epoch: 30 Global Step: 175050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:58,666-Speed 10602.98 samples/sec Loss 4.7403 LearningRate 0.0053 Epoch: 30 Global Step: 175060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:26:59,590-Speed 11090.27 samples/sec Loss 4.7720 LearningRate 0.0053 Epoch: 30 Global Step: 175070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:00,542-Speed 10766.68 samples/sec Loss 4.8284 LearningRate 0.0053 Epoch: 30 Global Step: 175080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:01,476-Speed 10969.98 samples/sec Loss 4.9603 LearningRate 0.0053 Epoch: 30 Global Step: 175090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:02,396-Speed 11142.37 samples/sec Loss 4.8421 LearningRate 0.0053 Epoch: 30 Global Step: 175100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:03,332-Speed 10945.32 samples/sec Loss 4.7282 LearningRate 0.0053 Epoch: 30 Global Step: 175110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:04,270-Speed 10926.68 samples/sec Loss 4.7691 LearningRate 0.0053 Epoch: 30 Global Step: 175120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:05,228-Speed 10691.15 samples/sec Loss 4.8308 LearningRate 0.0053 Epoch: 30 Global Step: 175130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:27:06,216-Speed 10368.35 samples/sec Loss 4.9560 LearningRate 0.0053 Epoch: 30 Global Step: 175140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:07,132-Speed 11193.08 samples/sec Loss 4.8642 LearningRate 0.0053 Epoch: 30 Global Step: 175150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:08,112-Speed 10461.71 samples/sec Loss 4.8842 LearningRate 0.0053 Epoch: 30 Global Step: 175160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:09,027-Speed 11197.30 samples/sec Loss 4.9283 LearningRate 0.0053 Epoch: 30 Global Step: 175170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:09,937-Speed 11267.93 samples/sec Loss 4.7157 LearningRate 0.0053 Epoch: 30 Global Step: 175180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:10,863-Speed 11061.14 samples/sec Loss 4.8439 LearningRate 0.0053 Epoch: 30 Global Step: 175190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:11,791-Speed 11048.30 samples/sec Loss 4.8394 LearningRate 0.0053 Epoch: 30 Global Step: 175200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:12,745-Speed 10738.56 samples/sec Loss 4.8666 LearningRate 0.0053 Epoch: 30 Global Step: 175210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:13,694-Speed 10799.06 samples/sec Loss 4.7988 LearningRate 0.0053 Epoch: 30 Global Step: 175220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:14,653-Speed 10682.67 samples/sec Loss 4.9809 LearningRate 0.0053 Epoch: 30 Global Step: 175230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:15,597-Speed 10855.45 samples/sec Loss 4.6954 LearningRate 0.0053 Epoch: 30 Global Step: 175240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:27:16,517-Speed 11134.96 samples/sec Loss 4.8566 LearningRate 0.0053 Epoch: 30 Global Step: 175250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:17,452-Speed 10957.72 samples/sec Loss 4.8763 LearningRate 0.0053 Epoch: 30 Global Step: 175260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:18,405-Speed 10759.10 samples/sec Loss 4.8963 LearningRate 0.0053 Epoch: 30 Global Step: 175270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:19,345-Speed 10896.16 samples/sec Loss 4.9050 LearningRate 0.0053 Epoch: 30 Global Step: 175280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:20,314-Speed 10571.45 samples/sec Loss 4.7790 LearningRate 0.0053 Epoch: 30 Global Step: 175290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:21,219-Speed 11330.25 samples/sec Loss 4.8842 LearningRate 0.0053 Epoch: 30 Global Step: 175300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:22,119-Speed 11382.80 samples/sec Loss 4.8642 LearningRate 0.0053 Epoch: 30 Global Step: 175310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:23,032-Speed 11222.31 samples/sec Loss 4.7895 LearningRate 0.0053 Epoch: 30 Global Step: 175320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:24,010-Speed 10482.83 samples/sec Loss 4.8825 LearningRate 0.0052 Epoch: 30 Global Step: 175330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:24,952-Speed 10875.19 samples/sec Loss 4.7915 LearningRate 0.0052 Epoch: 30 Global Step: 175340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:25,872-Speed 11135.11 samples/sec Loss 4.8048 LearningRate 0.0052 Epoch: 30 Global Step: 175350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:26,818-Speed 10827.03 samples/sec Loss 4.8350 LearningRate 0.0052 Epoch: 30 Global Step: 175360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:27:27,733-Speed 11203.00 samples/sec Loss 5.0237 LearningRate 0.0052 Epoch: 30 Global Step: 175370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:28,701-Speed 10591.77 samples/sec Loss 4.8380 LearningRate 0.0052 Epoch: 30 Global Step: 175380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:29,671-Speed 10564.48 samples/sec Loss 4.7532 LearningRate 0.0052 Epoch: 30 Global Step: 175390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:30,588-Speed 11168.52 samples/sec Loss 4.8491 LearningRate 0.0052 Epoch: 30 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:31,521-Speed 10979.96 samples/sec Loss 4.8606 LearningRate 0.0052 Epoch: 30 Global Step: 175410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:32,476-Speed 10733.80 samples/sec Loss 4.8030 LearningRate 0.0052 Epoch: 30 Global Step: 175420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:33,375-Speed 11404.20 samples/sec Loss 4.7962 LearningRate 0.0052 Epoch: 30 Global Step: 175430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:34,278-Speed 11342.99 samples/sec Loss 4.7625 LearningRate 0.0052 Epoch: 30 Global Step: 175440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:35,183-Speed 11325.89 samples/sec Loss 4.8238 LearningRate 0.0052 Epoch: 30 Global Step: 175450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:36,092-Speed 11275.99 samples/sec Loss 4.7797 LearningRate 0.0052 Epoch: 30 Global Step: 175460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:37,003-Speed 11245.39 samples/sec Loss 4.8525 LearningRate 0.0052 Epoch: 30 Global Step: 175470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:37,947-Speed 10849.25 samples/sec Loss 4.8390 LearningRate 0.0052 Epoch: 30 Global Step: 175480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:38,913-Speed 10617.59 samples/sec Loss 4.8792 LearningRate 0.0052 Epoch: 30 Global Step: 175490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:39,853-Speed 10891.98 samples/sec Loss 4.8370 LearningRate 0.0052 Epoch: 30 Global Step: 175500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:40,839-Speed 10391.92 samples/sec Loss 4.7798 LearningRate 0.0052 Epoch: 30 Global Step: 175510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:41,772-Speed 10983.22 samples/sec Loss 4.9268 LearningRate 0.0052 Epoch: 30 Global Step: 175520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:42,766-Speed 10310.87 samples/sec Loss 4.8910 LearningRate 0.0052 Epoch: 30 Global Step: 175530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:43,736-Speed 10568.97 samples/sec Loss 4.8702 LearningRate 0.0052 Epoch: 30 Global Step: 175540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:44,679-Speed 10866.75 samples/sec Loss 4.8769 LearningRate 0.0052 Epoch: 30 Global Step: 175550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:45,585-Speed 11301.58 samples/sec Loss 4.8730 LearningRate 0.0052 Epoch: 30 Global Step: 175560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:46,534-Speed 10802.58 samples/sec Loss 4.8423 LearningRate 0.0052 Epoch: 30 Global Step: 175570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:47,460-Speed 11060.85 samples/sec Loss 4.7719 LearningRate 0.0052 Epoch: 30 Global Step: 175580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:48,395-Speed 10956.20 samples/sec Loss 4.7980 LearningRate 0.0052 Epoch: 30 Global Step: 175590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:49,328-Speed 10990.60 samples/sec Loss 4.7273 LearningRate 0.0052 Epoch: 30 Global Step: 175600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:50,243-Speed 11207.15 samples/sec Loss 4.8041 LearningRate 0.0052 Epoch: 30 Global Step: 175610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:51,161-Speed 11167.05 samples/sec Loss 4.8550 LearningRate 0.0052 Epoch: 30 Global Step: 175620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:52,121-Speed 10668.05 samples/sec Loss 4.8673 LearningRate 0.0052 Epoch: 30 Global Step: 175630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:53,074-Speed 10756.12 samples/sec Loss 4.8768 LearningRate 0.0052 Epoch: 30 Global Step: 175640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:54,021-Speed 10823.60 samples/sec Loss 4.8454 LearningRate 0.0052 Epoch: 30 Global Step: 175650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:55,028-Speed 10175.33 samples/sec Loss 4.8301 LearningRate 0.0052 Epoch: 30 Global Step: 175660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:55,960-Speed 10993.38 samples/sec Loss 4.7190 LearningRate 0.0052 Epoch: 30 Global Step: 175670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:56,869-Speed 11271.92 samples/sec Loss 4.9004 LearningRate 0.0052 Epoch: 30 Global Step: 175680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:27:57,790-Speed 11125.15 samples/sec Loss 4.7707 LearningRate 0.0052 Epoch: 30 Global Step: 175690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:58,710-Speed 11154.58 samples/sec Loss 4.8926 LearningRate 0.0052 Epoch: 30 Global Step: 175700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:27:59,653-Speed 10862.04 samples/sec Loss 4.9953 LearningRate 0.0052 Epoch: 30 Global Step: 175710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:00,624-Speed 10552.67 samples/sec Loss 4.8097 LearningRate 0.0052 Epoch: 30 Global Step: 175720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:01,608-Speed 10416.09 samples/sec Loss 4.7642 LearningRate 0.0052 Epoch: 30 Global Step: 175730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:02,557-Speed 10791.30 samples/sec Loss 4.8189 LearningRate 0.0052 Epoch: 30 Global Step: 175740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:03,541-Speed 10413.72 samples/sec Loss 4.8398 LearningRate 0.0052 Epoch: 30 Global Step: 175750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:04,453-Speed 11234.60 samples/sec Loss 4.7525 LearningRate 0.0052 Epoch: 30 Global Step: 175760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:05,371-Speed 11159.73 samples/sec Loss 4.9239 LearningRate 0.0052 Epoch: 30 Global Step: 175770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:06,317-Speed 10835.90 samples/sec Loss 4.8388 LearningRate 0.0052 Epoch: 30 Global Step: 175780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:07,250-Speed 10985.92 samples/sec Loss 4.8711 LearningRate 0.0052 Epoch: 30 Global Step: 175790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:28:08,200-Speed 10778.56 samples/sec Loss 4.8520 LearningRate 0.0052 Epoch: 30 Global Step: 175800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:09,144-Speed 10862.50 samples/sec Loss 4.8154 LearningRate 0.0052 Epoch: 30 Global Step: 175810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:10,042-Speed 11416.93 samples/sec Loss 5.1182 LearningRate 0.0052 Epoch: 30 Global Step: 175820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:10,939-Speed 11414.87 samples/sec Loss 4.9242 LearningRate 0.0051 Epoch: 30 Global Step: 175830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:11,863-Speed 11089.69 samples/sec Loss 4.8253 LearningRate 0.0051 Epoch: 30 Global Step: 175840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:12,800-Speed 10933.57 samples/sec Loss 4.9343 LearningRate 0.0051 Epoch: 30 Global Step: 175850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:13,737-Speed 10937.35 samples/sec Loss 4.7379 LearningRate 0.0051 Epoch: 30 Global Step: 175860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:14,671-Speed 10980.41 samples/sec Loss 4.8331 LearningRate 0.0051 Epoch: 30 Global Step: 175870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:15,602-Speed 11005.99 samples/sec Loss 4.9810 LearningRate 0.0051 Epoch: 30 Global Step: 175880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:16,522-Speed 11134.18 samples/sec Loss 4.8006 LearningRate 0.0051 Epoch: 30 Global Step: 175890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:17,459-Speed 10934.21 samples/sec Loss 4.8399 LearningRate 0.0051 Epoch: 30 Global Step: 175900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:18,446-Speed 10388.71 samples/sec Loss 4.8909 LearningRate 0.0051 Epoch: 30 Global Step: 175910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:19,381-Speed 10956.31 samples/sec Loss 4.8649 LearningRate 0.0051 Epoch: 30 Global Step: 175920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:20,296-Speed 11206.72 samples/sec Loss 4.8533 LearningRate 0.0051 Epoch: 30 Global Step: 175930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:21,199-Speed 11348.75 samples/sec Loss 4.7974 LearningRate 0.0051 Epoch: 30 Global Step: 175940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:22,112-Speed 11222.31 samples/sec Loss 4.8048 LearningRate 0.0051 Epoch: 30 Global Step: 175950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:23,054-Speed 10870.75 samples/sec Loss 4.8343 LearningRate 0.0051 Epoch: 30 Global Step: 175960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:23,997-Speed 10874.81 samples/sec Loss 4.9057 LearningRate 0.0051 Epoch: 30 Global Step: 175970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:24,945-Speed 10804.80 samples/sec Loss 4.8010 LearningRate 0.0051 Epoch: 30 Global Step: 175980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:25,850-Speed 11320.34 samples/sec Loss 4.7056 LearningRate 0.0051 Epoch: 30 Global Step: 175990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:28:26,755-Speed 11331.38 samples/sec Loss 4.7777 LearningRate 0.0051 Epoch: 30 Global Step: 176000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:28:47,971-[lfw][176000]XNorm: 8.149462 Training: 2022-04-11 05:28:47,971-[lfw][176000]Accuracy-Flip: 0.99700+-0.00287 Training: 2022-04-11 05:28:47,972-[lfw][176000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:29:12,601-[cfp_fp][176000]XNorm: 6.931917 Training: 2022-04-11 05:29:12,602-[cfp_fp][176000]Accuracy-Flip: 0.95343+-0.01085 Training: 2022-04-11 05:29:12,602-[cfp_fp][176000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:29:33,826-[agedb_30][176000]XNorm: 7.985074 Training: 2022-04-11 05:29:33,827-[agedb_30][176000]Accuracy-Flip: 0.96683+-0.00790 Training: 2022-04-11 05:29:33,827-[agedb_30][176000]Accuracy-Highest: 0.96967 Training: 2022-04-11 05:29:34,761-Speed 150.58 samples/sec Loss 4.9051 LearningRate 0.0051 Epoch: 30 Global Step: 176010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:29:35,690-Speed 11029.60 samples/sec Loss 4.8594 LearningRate 0.0051 Epoch: 30 Global Step: 176020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:29:36,628-Speed 10928.16 samples/sec Loss 4.9390 LearningRate 0.0051 Epoch: 30 Global Step: 176030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:29:37,538-Speed 11254.55 samples/sec Loss 4.9596 LearningRate 0.0051 Epoch: 30 Global Step: 176040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:29:38,463-Speed 11081.13 samples/sec Loss 4.7805 LearningRate 0.0051 Epoch: 30 Global Step: 176050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:39,446-Speed 10418.24 samples/sec Loss 4.8574 LearningRate 0.0051 Epoch: 30 Global Step: 176060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:40,385-Speed 10917.01 samples/sec Loss 4.8620 LearningRate 0.0051 Epoch: 30 Global Step: 176070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:41,315-Speed 11018.68 samples/sec Loss 4.8025 LearningRate 0.0051 Epoch: 30 Global Step: 176080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:42,247-Speed 11013.62 samples/sec Loss 4.9160 LearningRate 0.0051 Epoch: 30 Global Step: 176090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:43,194-Speed 10811.71 samples/sec Loss 4.9179 LearningRate 0.0051 Epoch: 30 Global Step: 176100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:44,184-Speed 10351.98 samples/sec Loss 4.8270 LearningRate 0.0051 Epoch: 30 Global Step: 176110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:45,112-Speed 11046.08 samples/sec Loss 4.7604 LearningRate 0.0051 Epoch: 30 Global Step: 176120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:46,020-Speed 11280.72 samples/sec Loss 4.7899 LearningRate 0.0051 Epoch: 30 Global Step: 176130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:46,928-Speed 11283.69 samples/sec Loss 4.9149 LearningRate 0.0051 Epoch: 30 Global Step: 176140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:47,861-Speed 10986.24 samples/sec Loss 4.7771 LearningRate 0.0051 Epoch: 30 Global Step: 176150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:29:48,809-Speed 10812.59 samples/sec Loss 4.9115 LearningRate 0.0051 Epoch: 30 Global Step: 176160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:49,746-Speed 10940.54 samples/sec Loss 4.8597 LearningRate 0.0051 Epoch: 30 Global Step: 176170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:50,702-Speed 10714.90 samples/sec Loss 4.8275 LearningRate 0.0051 Epoch: 30 Global Step: 176180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:51,680-Speed 10485.24 samples/sec Loss 4.7755 LearningRate 0.0051 Epoch: 30 Global Step: 176190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:52,601-Speed 11119.61 samples/sec Loss 4.7658 LearningRate 0.0051 Epoch: 30 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:53,567-Speed 10607.97 samples/sec Loss 4.8274 LearningRate 0.0051 Epoch: 30 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:54,551-Speed 10416.35 samples/sec Loss 4.9245 LearningRate 0.0051 Epoch: 30 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:55,515-Speed 10638.08 samples/sec Loss 4.8787 LearningRate 0.0051 Epoch: 30 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:56,436-Speed 11124.62 samples/sec Loss 4.8013 LearningRate 0.0051 Epoch: 30 Global Step: 176240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:57,335-Speed 11398.39 samples/sec Loss 4.8417 LearningRate 0.0051 Epoch: 30 Global Step: 176250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:29:58,343-Speed 10168.72 samples/sec Loss 4.8626 LearningRate 0.0051 Epoch: 30 Global Step: 176260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:08,684-Speed 990.30 samples/sec Loss 4.7131 LearningRate 0.0051 Epoch: 31 Global Step: 176270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:09,635-Speed 10779.79 samples/sec Loss 4.3343 LearningRate 0.0051 Epoch: 31 Global Step: 176280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:10,576-Speed 10884.85 samples/sec Loss 4.3278 LearningRate 0.0051 Epoch: 31 Global Step: 176290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:11,500-Speed 11089.06 samples/sec Loss 4.3342 LearningRate 0.0051 Epoch: 31 Global Step: 176300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:12,657-Speed 8857.01 samples/sec Loss 4.4497 LearningRate 0.0051 Epoch: 31 Global Step: 176310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:13,694-Speed 9879.77 samples/sec Loss 4.3725 LearningRate 0.0051 Epoch: 31 Global Step: 176320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:14,697-Speed 10220.50 samples/sec Loss 4.3659 LearningRate 0.0050 Epoch: 31 Global Step: 176330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:15,635-Speed 10920.64 samples/sec Loss 4.4416 LearningRate 0.0050 Epoch: 31 Global Step: 176340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:16,565-Speed 11028.46 samples/sec Loss 4.3682 LearningRate 0.0050 Epoch: 31 Global Step: 176350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:17,540-Speed 10509.90 samples/sec Loss 4.3814 LearningRate 0.0050 Epoch: 31 Global Step: 176360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:30:18,456-Speed 11186.54 samples/sec Loss 4.3422 LearningRate 0.0050 Epoch: 31 Global Step: 176370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:19,426-Speed 10556.71 samples/sec Loss 4.3928 LearningRate 0.0050 Epoch: 31 Global Step: 176380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:20,424-Speed 10270.27 samples/sec Loss 4.5180 LearningRate 0.0050 Epoch: 31 Global Step: 176390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:21,334-Speed 11258.39 samples/sec Loss 4.4598 LearningRate 0.0050 Epoch: 31 Global Step: 176400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:22,247-Speed 11228.52 samples/sec Loss 4.3689 LearningRate 0.0050 Epoch: 31 Global Step: 176410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:23,166-Speed 11147.33 samples/sec Loss 4.3719 LearningRate 0.0050 Epoch: 31 Global Step: 176420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:24,046-Speed 11650.14 samples/sec Loss 4.3454 LearningRate 0.0050 Epoch: 31 Global Step: 176430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:24,984-Speed 10923.58 samples/sec Loss 4.4310 LearningRate 0.0050 Epoch: 31 Global Step: 176440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:25,899-Speed 11199.33 samples/sec Loss 4.3400 LearningRate 0.0050 Epoch: 31 Global Step: 176450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:26,843-Speed 10849.68 samples/sec Loss 4.3438 LearningRate 0.0050 Epoch: 31 Global Step: 176460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:27,790-Speed 10826.41 samples/sec Loss 4.3736 LearningRate 0.0050 Epoch: 31 Global Step: 176470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:28,727-Speed 10933.77 samples/sec Loss 4.3832 LearningRate 0.0050 Epoch: 31 Global Step: 176480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:29,697-Speed 10559.01 samples/sec Loss 4.4582 LearningRate 0.0050 Epoch: 31 Global Step: 176490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:30,621-Speed 11100.75 samples/sec Loss 4.5021 LearningRate 0.0050 Epoch: 31 Global Step: 176500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:31,524-Speed 11345.51 samples/sec Loss 4.3632 LearningRate 0.0050 Epoch: 31 Global Step: 176510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:32,445-Speed 11125.52 samples/sec Loss 4.4109 LearningRate 0.0050 Epoch: 31 Global Step: 176520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:33,348-Speed 11353.35 samples/sec Loss 4.4072 LearningRate 0.0050 Epoch: 31 Global Step: 176530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:34,255-Speed 11299.54 samples/sec Loss 4.4482 LearningRate 0.0050 Epoch: 31 Global Step: 176540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:35,160-Speed 11321.15 samples/sec Loss 4.5443 LearningRate 0.0050 Epoch: 31 Global Step: 176550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:36,065-Speed 11322.37 samples/sec Loss 4.4425 LearningRate 0.0050 Epoch: 31 Global Step: 176560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:36,989-Speed 11086.38 samples/sec Loss 4.5500 LearningRate 0.0050 Epoch: 31 Global Step: 176570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:37,951-Speed 10658.50 samples/sec Loss 4.3640 LearningRate 0.0050 Epoch: 31 Global Step: 176580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:38,879-Speed 11036.80 samples/sec Loss 4.4647 LearningRate 0.0050 Epoch: 31 Global Step: 176590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:39,816-Speed 10938.28 samples/sec Loss 4.4798 LearningRate 0.0050 Epoch: 31 Global Step: 176600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:40,752-Speed 10939.82 samples/sec Loss 4.3522 LearningRate 0.0050 Epoch: 31 Global Step: 176610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:41,689-Speed 10941.64 samples/sec Loss 4.3195 LearningRate 0.0050 Epoch: 31 Global Step: 176620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:42,634-Speed 10843.57 samples/sec Loss 4.3470 LearningRate 0.0050 Epoch: 31 Global Step: 176630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:43,580-Speed 10830.50 samples/sec Loss 4.5572 LearningRate 0.0050 Epoch: 31 Global Step: 176640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:44,518-Speed 10922.36 samples/sec Loss 4.3603 LearningRate 0.0050 Epoch: 31 Global Step: 176650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:45,437-Speed 11152.88 samples/sec Loss 4.3006 LearningRate 0.0050 Epoch: 31 Global Step: 176660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:46,370-Speed 10975.95 samples/sec Loss 4.5076 LearningRate 0.0050 Epoch: 31 Global Step: 176670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:47,263-Speed 11481.60 samples/sec Loss 4.4431 LearningRate 0.0050 Epoch: 31 Global Step: 176680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:48,235-Speed 10549.56 samples/sec Loss 4.4054 LearningRate 0.0050 Epoch: 31 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:49,195-Speed 10668.22 samples/sec Loss 4.4841 LearningRate 0.0050 Epoch: 31 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:30:50,143-Speed 10808.77 samples/sec Loss 4.5356 LearningRate 0.0050 Epoch: 31 Global Step: 176710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:51,124-Speed 10450.63 samples/sec Loss 4.4958 LearningRate 0.0050 Epoch: 31 Global Step: 176720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:52,056-Speed 10987.22 samples/sec Loss 4.4036 LearningRate 0.0050 Epoch: 31 Global Step: 176730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:52,977-Speed 11137.19 samples/sec Loss 4.4628 LearningRate 0.0050 Epoch: 31 Global Step: 176740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:53,905-Speed 11034.37 samples/sec Loss 4.3135 LearningRate 0.0050 Epoch: 31 Global Step: 176750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:54,870-Speed 10623.21 samples/sec Loss 4.2837 LearningRate 0.0050 Epoch: 31 Global Step: 176760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:55,837-Speed 10594.10 samples/sec Loss 4.3411 LearningRate 0.0050 Epoch: 31 Global Step: 176770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:56,736-Speed 11389.23 samples/sec Loss 4.5426 LearningRate 0.0050 Epoch: 31 Global Step: 176780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:57,684-Speed 10816.78 samples/sec Loss 4.4713 LearningRate 0.0050 Epoch: 31 Global Step: 176790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:58,614-Speed 11018.59 samples/sec Loss 4.3612 LearningRate 0.0050 Epoch: 31 Global Step: 176800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:30:59,517-Speed 11352.92 samples/sec Loss 4.4317 LearningRate 0.0050 Epoch: 31 Global Step: 176810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:00,509-Speed 10321.80 samples/sec Loss 4.4609 LearningRate 0.0050 Epoch: 31 Global Step: 176820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:01,449-Speed 10905.15 samples/sec Loss 4.5248 LearningRate 0.0050 Epoch: 31 Global Step: 176830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:02,362-Speed 11222.58 samples/sec Loss 4.3814 LearningRate 0.0049 Epoch: 31 Global Step: 176840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:03,285-Speed 11099.14 samples/sec Loss 4.3966 LearningRate 0.0049 Epoch: 31 Global Step: 176850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:04,168-Speed 11602.84 samples/sec Loss 4.3928 LearningRate 0.0049 Epoch: 31 Global Step: 176860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:05,091-Speed 11104.12 samples/sec Loss 4.5155 LearningRate 0.0049 Epoch: 31 Global Step: 176870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:06,064-Speed 10536.76 samples/sec Loss 4.3646 LearningRate 0.0049 Epoch: 31 Global Step: 176880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:06,996-Speed 10996.18 samples/sec Loss 4.4037 LearningRate 0.0049 Epoch: 31 Global Step: 176890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:07,973-Speed 10487.92 samples/sec Loss 4.5506 LearningRate 0.0049 Epoch: 31 Global Step: 176900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:08,932-Speed 10688.93 samples/sec Loss 4.4739 LearningRate 0.0049 Epoch: 31 Global Step: 176910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:09,894-Speed 10643.35 samples/sec Loss 4.4299 LearningRate 0.0049 Epoch: 31 Global Step: 176920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:10,801-Speed 11304.68 samples/sec Loss 4.3912 LearningRate 0.0049 Epoch: 31 Global Step: 176930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:11,706-Speed 11321.17 samples/sec Loss 4.5905 LearningRate 0.0049 Epoch: 31 Global Step: 176940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:12,628-Speed 11110.38 samples/sec Loss 4.5213 LearningRate 0.0049 Epoch: 31 Global Step: 176950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:13,609-Speed 10444.80 samples/sec Loss 4.5573 LearningRate 0.0049 Epoch: 31 Global Step: 176960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:14,586-Speed 10491.68 samples/sec Loss 4.3860 LearningRate 0.0049 Epoch: 31 Global Step: 176970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:15,536-Speed 10782.63 samples/sec Loss 4.4366 LearningRate 0.0049 Epoch: 31 Global Step: 176980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:16,478-Speed 10875.88 samples/sec Loss 4.4541 LearningRate 0.0049 Epoch: 31 Global Step: 176990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:17,433-Speed 10728.80 samples/sec Loss 4.4711 LearningRate 0.0049 Epoch: 31 Global Step: 177000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:18,375-Speed 10888.95 samples/sec Loss 4.5828 LearningRate 0.0049 Epoch: 31 Global Step: 177010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:19,300-Speed 11073.67 samples/sec Loss 4.4852 LearningRate 0.0049 Epoch: 31 Global Step: 177020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:20,304-Speed 10201.22 samples/sec Loss 4.5829 LearningRate 0.0049 Epoch: 31 Global Step: 177030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:21,212-Speed 11288.92 samples/sec Loss 4.4834 LearningRate 0.0049 Epoch: 31 Global Step: 177040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:22,118-Speed 11304.87 samples/sec Loss 4.6101 LearningRate 0.0049 Epoch: 31 Global Step: 177050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:23,106-Speed 10374.99 samples/sec Loss 4.5635 LearningRate 0.0049 Epoch: 31 Global Step: 177060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:24,040-Speed 10970.95 samples/sec Loss 4.4447 LearningRate 0.0049 Epoch: 31 Global Step: 177070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:24,955-Speed 11204.44 samples/sec Loss 4.3949 LearningRate 0.0049 Epoch: 31 Global Step: 177080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:25,857-Speed 11357.73 samples/sec Loss 4.5670 LearningRate 0.0049 Epoch: 31 Global Step: 177090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:26,802-Speed 10840.10 samples/sec Loss 4.3881 LearningRate 0.0049 Epoch: 31 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:27,791-Speed 10370.41 samples/sec Loss 4.4882 LearningRate 0.0049 Epoch: 31 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:28,698-Speed 11291.74 samples/sec Loss 4.4609 LearningRate 0.0049 Epoch: 31 Global Step: 177120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:29,602-Speed 11347.15 samples/sec Loss 4.5325 LearningRate 0.0049 Epoch: 31 Global Step: 177130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:30,519-Speed 11175.01 samples/sec Loss 4.5856 LearningRate 0.0049 Epoch: 31 Global Step: 177140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:31,488-Speed 10568.21 samples/sec Loss 4.4171 LearningRate 0.0049 Epoch: 31 Global Step: 177150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:32,410-Speed 11115.31 samples/sec Loss 4.5422 LearningRate 0.0049 Epoch: 31 Global Step: 177160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:33,320-Speed 11263.07 samples/sec Loss 4.5698 LearningRate 0.0049 Epoch: 31 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:34,230-Speed 11257.80 samples/sec Loss 4.4485 LearningRate 0.0049 Epoch: 31 Global Step: 177180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:31:35,132-Speed 11360.19 samples/sec Loss 4.5467 LearningRate 0.0049 Epoch: 31 Global Step: 177190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:36,061-Speed 11034.59 samples/sec Loss 4.5957 LearningRate 0.0049 Epoch: 31 Global Step: 177200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:36,970-Speed 11273.37 samples/sec Loss 4.6174 LearningRate 0.0049 Epoch: 31 Global Step: 177210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:37,909-Speed 10914.69 samples/sec Loss 4.3985 LearningRate 0.0049 Epoch: 31 Global Step: 177220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:38,899-Speed 10345.82 samples/sec Loss 4.4683 LearningRate 0.0049 Epoch: 31 Global Step: 177230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:39,880-Speed 10449.55 samples/sec Loss 4.5040 LearningRate 0.0049 Epoch: 31 Global Step: 177240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:40,858-Speed 10470.13 samples/sec Loss 4.6010 LearningRate 0.0049 Epoch: 31 Global Step: 177250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:41,806-Speed 10810.46 samples/sec Loss 4.3777 LearningRate 0.0049 Epoch: 31 Global Step: 177260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:42,710-Speed 11339.42 samples/sec Loss 4.6064 LearningRate 0.0049 Epoch: 31 Global Step: 177270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:43,633-Speed 11105.28 samples/sec Loss 4.5298 LearningRate 0.0049 Epoch: 31 Global Step: 177280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:44,562-Speed 11021.06 samples/sec Loss 4.5107 LearningRate 0.0049 Epoch: 31 Global Step: 177290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:45,482-Speed 11142.22 samples/sec Loss 4.5818 LearningRate 0.0049 Epoch: 31 Global Step: 177300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:46,400-Speed 11165.30 samples/sec Loss 4.4698 LearningRate 0.0049 Epoch: 31 Global Step: 177310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:47,299-Speed 11400.24 samples/sec Loss 4.4386 LearningRate 0.0049 Epoch: 31 Global Step: 177320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:48,203-Speed 11331.14 samples/sec Loss 4.4619 LearningRate 0.0049 Epoch: 31 Global Step: 177330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:49,130-Speed 11058.17 samples/sec Loss 4.5709 LearningRate 0.0049 Epoch: 31 Global Step: 177340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:50,052-Speed 11114.00 samples/sec Loss 4.5739 LearningRate 0.0049 Epoch: 31 Global Step: 177350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:50,989-Speed 10940.82 samples/sec Loss 4.5328 LearningRate 0.0048 Epoch: 31 Global Step: 177360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:51,917-Speed 11038.30 samples/sec Loss 4.3939 LearningRate 0.0048 Epoch: 31 Global Step: 177370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:31:52,836-Speed 11156.67 samples/sec Loss 4.5501 LearningRate 0.0048 Epoch: 31 Global Step: 177380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:53,745-Speed 11274.52 samples/sec Loss 4.5657 LearningRate 0.0048 Epoch: 31 Global Step: 177390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:54,685-Speed 10900.01 samples/sec Loss 4.5929 LearningRate 0.0048 Epoch: 31 Global Step: 177400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:55,581-Speed 11430.13 samples/sec Loss 4.5379 LearningRate 0.0048 Epoch: 31 Global Step: 177410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:56,520-Speed 10918.96 samples/sec Loss 4.4940 LearningRate 0.0048 Epoch: 31 Global Step: 177420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:57,422-Speed 11359.29 samples/sec Loss 4.5176 LearningRate 0.0048 Epoch: 31 Global Step: 177430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:58,369-Speed 10821.28 samples/sec Loss 4.4230 LearningRate 0.0048 Epoch: 31 Global Step: 177440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:31:59,254-Speed 11572.37 samples/sec Loss 4.5372 LearningRate 0.0048 Epoch: 31 Global Step: 177450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:00,189-Speed 10965.85 samples/sec Loss 4.5548 LearningRate 0.0048 Epoch: 31 Global Step: 177460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:01,131-Speed 10880.03 samples/sec Loss 4.6671 LearningRate 0.0048 Epoch: 31 Global Step: 177470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:02,034-Speed 11342.11 samples/sec Loss 4.4905 LearningRate 0.0048 Epoch: 31 Global Step: 177480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:02,985-Speed 10769.30 samples/sec Loss 4.6334 LearningRate 0.0048 Epoch: 31 Global Step: 177490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:03,973-Speed 10382.38 samples/sec Loss 4.4936 LearningRate 0.0048 Epoch: 31 Global Step: 177500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:04,897-Speed 11089.98 samples/sec Loss 4.6600 LearningRate 0.0048 Epoch: 31 Global Step: 177510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:05,800-Speed 11347.59 samples/sec Loss 4.4608 LearningRate 0.0048 Epoch: 31 Global Step: 177520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:06,711-Speed 11249.64 samples/sec Loss 4.5936 LearningRate 0.0048 Epoch: 31 Global Step: 177530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:07,629-Speed 11160.75 samples/sec Loss 4.4563 LearningRate 0.0048 Epoch: 31 Global Step: 177540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:08,540-Speed 11246.21 samples/sec Loss 4.5348 LearningRate 0.0048 Epoch: 31 Global Step: 177550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:09,444-Speed 11332.05 samples/sec Loss 4.5196 LearningRate 0.0048 Epoch: 31 Global Step: 177560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:10,355-Speed 11258.81 samples/sec Loss 4.7345 LearningRate 0.0048 Epoch: 31 Global Step: 177570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:11,258-Speed 11338.78 samples/sec Loss 4.4774 LearningRate 0.0048 Epoch: 31 Global Step: 177580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:12,192-Speed 10975.38 samples/sec Loss 4.4478 LearningRate 0.0048 Epoch: 31 Global Step: 177590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:13,147-Speed 10733.34 samples/sec Loss 4.5594 LearningRate 0.0048 Epoch: 31 Global Step: 177600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:14,071-Speed 11088.03 samples/sec Loss 4.5569 LearningRate 0.0048 Epoch: 31 Global Step: 177610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:14,962-Speed 11499.72 samples/sec Loss 4.5279 LearningRate 0.0048 Epoch: 31 Global Step: 177620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:15,904-Speed 10871.99 samples/sec Loss 4.5955 LearningRate 0.0048 Epoch: 31 Global Step: 177630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:16,822-Speed 11171.21 samples/sec Loss 4.4650 LearningRate 0.0048 Epoch: 31 Global Step: 177640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:17,754-Speed 10982.61 samples/sec Loss 4.4607 LearningRate 0.0048 Epoch: 31 Global Step: 177650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:18,729-Speed 10515.14 samples/sec Loss 4.5251 LearningRate 0.0048 Epoch: 31 Global Step: 177660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:19,675-Speed 10832.43 samples/sec Loss 4.4928 LearningRate 0.0048 Epoch: 31 Global Step: 177670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:20,630-Speed 10728.17 samples/sec Loss 4.6349 LearningRate 0.0048 Epoch: 31 Global Step: 177680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:21,558-Speed 11045.00 samples/sec Loss 4.4949 LearningRate 0.0048 Epoch: 31 Global Step: 177690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:22,504-Speed 10832.20 samples/sec Loss 4.5164 LearningRate 0.0048 Epoch: 31 Global Step: 177700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:23,434-Speed 11029.61 samples/sec Loss 4.6291 LearningRate 0.0048 Epoch: 31 Global Step: 177710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:24,394-Speed 10672.42 samples/sec Loss 4.6634 LearningRate 0.0048 Epoch: 31 Global Step: 177720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:25,324-Speed 11011.68 samples/sec Loss 4.5056 LearningRate 0.0048 Epoch: 31 Global Step: 177730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:32:26,228-Speed 11336.38 samples/sec Loss 4.6102 LearningRate 0.0048 Epoch: 31 Global Step: 177740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:27,176-Speed 10809.62 samples/sec Loss 4.5617 LearningRate 0.0048 Epoch: 31 Global Step: 177750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:28,081-Speed 11323.93 samples/sec Loss 4.5689 LearningRate 0.0048 Epoch: 31 Global Step: 177760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:29,005-Speed 11090.75 samples/sec Loss 4.7020 LearningRate 0.0048 Epoch: 31 Global Step: 177770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:29,947-Speed 10877.37 samples/sec Loss 4.7121 LearningRate 0.0048 Epoch: 31 Global Step: 177780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:30,894-Speed 10818.85 samples/sec Loss 4.5719 LearningRate 0.0048 Epoch: 31 Global Step: 177790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:31,842-Speed 10815.68 samples/sec Loss 4.4807 LearningRate 0.0048 Epoch: 31 Global Step: 177800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:32,773-Speed 11000.44 samples/sec Loss 4.6291 LearningRate 0.0048 Epoch: 31 Global Step: 177810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:33,685-Speed 11238.68 samples/sec Loss 4.4893 LearningRate 0.0048 Epoch: 31 Global Step: 177820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:34,607-Speed 11119.07 samples/sec Loss 4.5938 LearningRate 0.0048 Epoch: 31 Global Step: 177830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:35,533-Speed 11060.93 samples/sec Loss 4.5809 LearningRate 0.0048 Epoch: 31 Global Step: 177840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:36,491-Speed 10698.60 samples/sec Loss 4.5002 LearningRate 0.0048 Epoch: 31 Global Step: 177850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:37,440-Speed 10794.72 samples/sec Loss 4.6327 LearningRate 0.0048 Epoch: 31 Global Step: 177860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:38,381-Speed 10890.88 samples/sec Loss 4.6507 LearningRate 0.0048 Epoch: 31 Global Step: 177870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:39,312-Speed 11007.15 samples/sec Loss 4.5662 LearningRate 0.0047 Epoch: 31 Global Step: 177880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:40,278-Speed 10600.86 samples/sec Loss 4.7616 LearningRate 0.0047 Epoch: 31 Global Step: 177890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:32:41,212-Speed 10976.61 samples/sec Loss 4.5096 LearningRate 0.0047 Epoch: 31 Global Step: 177900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:42,176-Speed 10625.05 samples/sec Loss 4.5647 LearningRate 0.0047 Epoch: 31 Global Step: 177910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:43,128-Speed 10767.52 samples/sec Loss 4.4685 LearningRate 0.0047 Epoch: 31 Global Step: 177920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:44,084-Speed 10721.73 samples/sec Loss 4.6249 LearningRate 0.0047 Epoch: 31 Global Step: 177930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:45,033-Speed 10796.20 samples/sec Loss 4.6103 LearningRate 0.0047 Epoch: 31 Global Step: 177940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:45,962-Speed 11029.67 samples/sec Loss 4.5508 LearningRate 0.0047 Epoch: 31 Global Step: 177950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:46,871-Speed 11272.85 samples/sec Loss 4.6346 LearningRate 0.0047 Epoch: 31 Global Step: 177960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:47,803-Speed 10991.44 samples/sec Loss 4.4631 LearningRate 0.0047 Epoch: 31 Global Step: 177970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:48,712-Speed 11286.38 samples/sec Loss 4.5773 LearningRate 0.0047 Epoch: 31 Global Step: 177980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:49,634-Speed 11111.60 samples/sec Loss 4.6120 LearningRate 0.0047 Epoch: 31 Global Step: 177990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:32:50,580-Speed 10831.29 samples/sec Loss 4.5852 LearningRate 0.0047 Epoch: 31 Global Step: 178000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:33:11,981-[lfw][178000]XNorm: 8.050511 Training: 2022-04-11 05:33:11,982-[lfw][178000]Accuracy-Flip: 0.99700+-0.00296 Training: 2022-04-11 05:33:11,982-[lfw][178000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:33:36,673-[cfp_fp][178000]XNorm: 6.858001 Training: 2022-04-11 05:33:36,674-[cfp_fp][178000]Accuracy-Flip: 0.95386+-0.01018 Training: 2022-04-11 05:33:36,674-[cfp_fp][178000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:33:57,999-[agedb_30][178000]XNorm: 7.899719 Training: 2022-04-11 05:33:57,999-[agedb_30][178000]Accuracy-Flip: 0.96833+-0.00806 Training: 2022-04-11 05:33:58,000-[agedb_30][178000]Accuracy-Highest: 0.96967 Training: 2022-04-11 05:33:58,947-Speed 149.78 samples/sec Loss 4.6382 LearningRate 0.0047 Epoch: 31 Global Step: 178010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:33:59,903-Speed 10724.19 samples/sec Loss 4.5604 LearningRate 0.0047 Epoch: 31 Global Step: 178020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:00,835-Speed 10985.35 samples/sec Loss 4.5345 LearningRate 0.0047 Epoch: 31 Global Step: 178030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:01,775-Speed 10901.44 samples/sec Loss 4.5787 LearningRate 0.0047 Epoch: 31 Global Step: 178040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:02,743-Speed 10587.50 samples/sec Loss 4.6392 LearningRate 0.0047 Epoch: 31 Global Step: 178050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:03,679-Speed 10945.71 samples/sec Loss 4.5036 LearningRate 0.0047 Epoch: 31 Global Step: 178060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:04,615-Speed 10945.71 samples/sec Loss 4.5209 LearningRate 0.0047 Epoch: 31 Global Step: 178070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:05,530-Speed 11201.50 samples/sec Loss 4.5996 LearningRate 0.0047 Epoch: 31 Global Step: 178080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:06,437-Speed 11294.73 samples/sec Loss 4.5398 LearningRate 0.0047 Epoch: 31 Global Step: 178090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:07,354-Speed 11183.29 samples/sec Loss 4.5454 LearningRate 0.0047 Epoch: 31 Global Step: 178100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:08,272-Speed 11164.72 samples/sec Loss 4.6918 LearningRate 0.0047 Epoch: 31 Global Step: 178110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:09,228-Speed 10710.18 samples/sec Loss 4.6406 LearningRate 0.0047 Epoch: 31 Global Step: 178120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:10,140-Speed 11241.65 samples/sec Loss 4.5102 LearningRate 0.0047 Epoch: 31 Global Step: 178130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:11,039-Speed 11397.95 samples/sec Loss 4.6071 LearningRate 0.0047 Epoch: 31 Global Step: 178140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:11,984-Speed 10838.50 samples/sec Loss 4.5359 LearningRate 0.0047 Epoch: 31 Global Step: 178150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:12,904-Speed 11143.75 samples/sec Loss 4.5953 LearningRate 0.0047 Epoch: 31 Global Step: 178160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:13,816-Speed 11234.73 samples/sec Loss 4.6038 LearningRate 0.0047 Epoch: 31 Global Step: 178170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:14,762-Speed 10829.15 samples/sec Loss 4.5143 LearningRate 0.0047 Epoch: 31 Global Step: 178180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:15,659-Speed 11431.53 samples/sec Loss 4.6345 LearningRate 0.0047 Epoch: 31 Global Step: 178190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:16,590-Speed 11008.40 samples/sec Loss 4.6116 LearningRate 0.0047 Epoch: 31 Global Step: 178200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:17,489-Speed 11398.38 samples/sec Loss 4.5222 LearningRate 0.0047 Epoch: 31 Global Step: 178210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:18,435-Speed 10830.21 samples/sec Loss 4.6026 LearningRate 0.0047 Epoch: 31 Global Step: 178220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:19,389-Speed 10743.14 samples/sec Loss 4.5035 LearningRate 0.0047 Epoch: 31 Global Step: 178230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:20,317-Speed 11043.55 samples/sec Loss 4.5929 LearningRate 0.0047 Epoch: 31 Global Step: 178240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:21,224-Speed 11291.80 samples/sec Loss 4.5278 LearningRate 0.0047 Epoch: 31 Global Step: 178250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:22,143-Speed 11155.02 samples/sec Loss 4.5969 LearningRate 0.0047 Epoch: 31 Global Step: 178260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:23,113-Speed 10559.94 samples/sec Loss 4.6810 LearningRate 0.0047 Epoch: 31 Global Step: 178270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:24,053-Speed 10901.88 samples/sec Loss 4.6299 LearningRate 0.0047 Epoch: 31 Global Step: 178280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:24,979-Speed 11070.01 samples/sec Loss 4.5453 LearningRate 0.0047 Epoch: 31 Global Step: 178290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:25,899-Speed 11131.02 samples/sec Loss 4.4303 LearningRate 0.0047 Epoch: 31 Global Step: 178300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:26,806-Speed 11306.26 samples/sec Loss 4.5357 LearningRate 0.0047 Epoch: 31 Global Step: 178310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:27,756-Speed 10778.74 samples/sec Loss 4.6753 LearningRate 0.0047 Epoch: 31 Global Step: 178320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:28,707-Speed 10782.12 samples/sec Loss 4.8000 LearningRate 0.0047 Epoch: 31 Global Step: 178330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:29,618-Speed 11248.80 samples/sec Loss 4.4812 LearningRate 0.0047 Epoch: 31 Global Step: 178340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:30,550-Speed 10995.32 samples/sec Loss 4.6310 LearningRate 0.0047 Epoch: 31 Global Step: 178350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:31,468-Speed 11160.01 samples/sec Loss 4.5602 LearningRate 0.0047 Epoch: 31 Global Step: 178360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:32,427-Speed 10685.10 samples/sec Loss 4.6993 LearningRate 0.0047 Epoch: 31 Global Step: 178370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:33,331-Speed 11335.71 samples/sec Loss 4.5719 LearningRate 0.0047 Epoch: 31 Global Step: 178380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:34,240-Speed 11269.47 samples/sec Loss 4.6514 LearningRate 0.0047 Epoch: 31 Global Step: 178390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:35,170-Speed 11024.95 samples/sec Loss 4.7146 LearningRate 0.0046 Epoch: 31 Global Step: 178400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:36,102-Speed 10987.59 samples/sec Loss 4.7160 LearningRate 0.0046 Epoch: 31 Global Step: 178410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:37,012-Speed 11267.96 samples/sec Loss 4.5797 LearningRate 0.0046 Epoch: 31 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:37,947-Speed 10964.66 samples/sec Loss 4.6734 LearningRate 0.0046 Epoch: 31 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:38,867-Speed 11129.33 samples/sec Loss 4.4410 LearningRate 0.0046 Epoch: 31 Global Step: 178440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:39,850-Speed 10431.63 samples/sec Loss 4.6881 LearningRate 0.0046 Epoch: 31 Global Step: 178450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:40,782-Speed 10987.78 samples/sec Loss 4.6540 LearningRate 0.0046 Epoch: 31 Global Step: 178460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:41,761-Speed 10470.58 samples/sec Loss 4.6436 LearningRate 0.0046 Epoch: 31 Global Step: 178470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:42,703-Speed 10875.04 samples/sec Loss 4.5887 LearningRate 0.0046 Epoch: 31 Global Step: 178480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:43,625-Speed 11113.90 samples/sec Loss 4.7741 LearningRate 0.0046 Epoch: 31 Global Step: 178490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:44,566-Speed 10895.96 samples/sec Loss 4.7075 LearningRate 0.0046 Epoch: 31 Global Step: 178500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:45,488-Speed 11104.70 samples/sec Loss 4.6139 LearningRate 0.0046 Epoch: 31 Global Step: 178510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:34:46,411-Speed 11105.68 samples/sec Loss 4.5890 LearningRate 0.0046 Epoch: 31 Global Step: 178520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:47,373-Speed 10650.35 samples/sec Loss 4.6068 LearningRate 0.0046 Epoch: 31 Global Step: 178530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:48,287-Speed 11207.73 samples/sec Loss 4.6198 LearningRate 0.0046 Epoch: 31 Global Step: 178540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:49,191-Speed 11336.95 samples/sec Loss 4.7571 LearningRate 0.0046 Epoch: 31 Global Step: 178550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:50,117-Speed 11063.27 samples/sec Loss 4.6048 LearningRate 0.0046 Epoch: 31 Global Step: 178560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:51,042-Speed 11075.96 samples/sec Loss 4.7316 LearningRate 0.0046 Epoch: 31 Global Step: 178570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:51,988-Speed 10841.54 samples/sec Loss 4.5366 LearningRate 0.0046 Epoch: 31 Global Step: 178580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:52,923-Speed 10961.41 samples/sec Loss 4.7132 LearningRate 0.0046 Epoch: 31 Global Step: 178590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:53,815-Speed 11475.67 samples/sec Loss 4.6939 LearningRate 0.0046 Epoch: 31 Global Step: 178600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:34:54,723-Speed 11296.83 samples/sec Loss 4.6422 LearningRate 0.0046 Epoch: 31 Global Step: 178610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:34:55,695-Speed 10541.63 samples/sec Loss 4.5969 LearningRate 0.0046 Epoch: 31 Global Step: 178620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:34:56,615-Speed 11129.68 samples/sec Loss 4.5865 LearningRate 0.0046 Epoch: 31 Global Step: 178630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:34:57,561-Speed 10835.19 samples/sec Loss 4.6173 LearningRate 0.0046 Epoch: 31 Global Step: 178640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:34:58,465-Speed 11336.07 samples/sec Loss 4.8328 LearningRate 0.0046 Epoch: 31 Global Step: 178650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:34:59,405-Speed 10897.12 samples/sec Loss 4.5559 LearningRate 0.0046 Epoch: 31 Global Step: 178660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:35:00,343-Speed 10929.99 samples/sec Loss 4.7229 LearningRate 0.0046 Epoch: 31 Global Step: 178670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:35:01,249-Speed 11309.31 samples/sec Loss 4.6528 LearningRate 0.0046 Epoch: 31 Global Step: 178680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:35:02,198-Speed 10801.89 samples/sec Loss 4.6737 LearningRate 0.0046 Epoch: 31 Global Step: 178690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:35:03,171-Speed 10535.51 samples/sec Loss 4.6162 LearningRate 0.0046 Epoch: 31 Global Step: 178700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:35:04,114-Speed 10863.74 samples/sec Loss 4.6694 LearningRate 0.0046 Epoch: 31 Global Step: 178710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:05,082-Speed 10592.03 samples/sec Loss 4.5988 LearningRate 0.0046 Epoch: 31 Global Step: 178720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:05,992-Speed 11253.37 samples/sec Loss 4.6122 LearningRate 0.0046 Epoch: 31 Global Step: 178730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:06,925-Speed 10986.64 samples/sec Loss 4.6393 LearningRate 0.0046 Epoch: 31 Global Step: 178740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:07,825-Speed 11391.14 samples/sec Loss 4.5499 LearningRate 0.0046 Epoch: 31 Global Step: 178750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:08,764-Speed 10913.45 samples/sec Loss 4.5143 LearningRate 0.0046 Epoch: 31 Global Step: 178760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:09,676-Speed 11226.35 samples/sec Loss 4.6094 LearningRate 0.0046 Epoch: 31 Global Step: 178770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:10,599-Speed 11109.39 samples/sec Loss 4.5788 LearningRate 0.0046 Epoch: 31 Global Step: 178780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:11,512-Speed 11223.08 samples/sec Loss 4.4844 LearningRate 0.0046 Epoch: 31 Global Step: 178790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:12,463-Speed 10775.91 samples/sec Loss 4.5800 LearningRate 0.0046 Epoch: 31 Global Step: 178800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:13,420-Speed 10699.98 samples/sec Loss 4.6438 LearningRate 0.0046 Epoch: 31 Global Step: 178810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:14,402-Speed 10440.71 samples/sec Loss 4.5991 LearningRate 0.0046 Epoch: 31 Global Step: 178820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:15,330-Speed 11034.60 samples/sec Loss 4.6486 LearningRate 0.0046 Epoch: 31 Global Step: 178830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:16,263-Speed 10982.69 samples/sec Loss 4.5509 LearningRate 0.0046 Epoch: 31 Global Step: 178840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:17,197-Speed 10973.16 samples/sec Loss 4.6294 LearningRate 0.0046 Epoch: 31 Global Step: 178850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:18,100-Speed 11347.62 samples/sec Loss 4.6338 LearningRate 0.0046 Epoch: 31 Global Step: 178860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:19,024-Speed 11090.37 samples/sec Loss 4.6484 LearningRate 0.0046 Epoch: 31 Global Step: 178870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:20,013-Speed 10360.33 samples/sec Loss 4.6464 LearningRate 0.0046 Epoch: 31 Global Step: 178880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:20,916-Speed 11354.73 samples/sec Loss 4.6967 LearningRate 0.0046 Epoch: 31 Global Step: 178890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:21,867-Speed 10767.38 samples/sec Loss 4.5616 LearningRate 0.0046 Epoch: 31 Global Step: 178900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:22,805-Speed 10928.92 samples/sec Loss 4.6166 LearningRate 0.0046 Epoch: 31 Global Step: 178910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:23,707-Speed 11363.20 samples/sec Loss 4.6168 LearningRate 0.0046 Epoch: 31 Global Step: 178920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:24,687-Speed 10452.19 samples/sec Loss 4.6971 LearningRate 0.0045 Epoch: 31 Global Step: 178930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:25,604-Speed 11180.35 samples/sec Loss 4.5643 LearningRate 0.0045 Epoch: 31 Global Step: 178940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:26,542-Speed 10915.99 samples/sec Loss 4.6882 LearningRate 0.0045 Epoch: 31 Global Step: 178950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:27,528-Speed 10393.21 samples/sec Loss 4.8012 LearningRate 0.0045 Epoch: 31 Global Step: 178960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:28,491-Speed 10647.32 samples/sec Loss 4.5544 LearningRate 0.0045 Epoch: 31 Global Step: 178970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:29,444-Speed 10747.10 samples/sec Loss 4.6231 LearningRate 0.0045 Epoch: 31 Global Step: 178980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:30,373-Speed 11029.16 samples/sec Loss 4.5765 LearningRate 0.0045 Epoch: 31 Global Step: 178990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:31,299-Speed 11064.66 samples/sec Loss 4.7292 LearningRate 0.0045 Epoch: 31 Global Step: 179000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:32,212-Speed 11228.35 samples/sec Loss 4.6552 LearningRate 0.0045 Epoch: 31 Global Step: 179010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:33,143-Speed 11003.96 samples/sec Loss 4.7143 LearningRate 0.0045 Epoch: 31 Global Step: 179020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:34,053-Speed 11268.52 samples/sec Loss 4.6834 LearningRate 0.0045 Epoch: 31 Global Step: 179030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:34,958-Speed 11323.73 samples/sec Loss 4.7264 LearningRate 0.0045 Epoch: 31 Global Step: 179040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:35,884-Speed 11061.94 samples/sec Loss 4.4749 LearningRate 0.0045 Epoch: 31 Global Step: 179050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:36,832-Speed 10811.22 samples/sec Loss 4.6542 LearningRate 0.0045 Epoch: 31 Global Step: 179060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:37,741-Speed 11276.03 samples/sec Loss 4.6441 LearningRate 0.0045 Epoch: 31 Global Step: 179070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:38,651-Speed 11255.25 samples/sec Loss 4.6593 LearningRate 0.0045 Epoch: 31 Global Step: 179080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:39,576-Speed 11076.18 samples/sec Loss 4.6683 LearningRate 0.0045 Epoch: 31 Global Step: 179090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:40,505-Speed 11037.05 samples/sec Loss 4.6435 LearningRate 0.0045 Epoch: 31 Global Step: 179100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:41,497-Speed 10329.36 samples/sec Loss 4.5745 LearningRate 0.0045 Epoch: 31 Global Step: 179110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:42,453-Speed 10727.70 samples/sec Loss 4.6450 LearningRate 0.0045 Epoch: 31 Global Step: 179120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:43,383-Speed 11014.24 samples/sec Loss 4.5732 LearningRate 0.0045 Epoch: 31 Global Step: 179130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:44,367-Speed 10412.14 samples/sec Loss 4.5832 LearningRate 0.0045 Epoch: 31 Global Step: 179140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:45,293-Speed 11062.29 samples/sec Loss 4.6088 LearningRate 0.0045 Epoch: 31 Global Step: 179150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:46,206-Speed 11231.08 samples/sec Loss 4.6156 LearningRate 0.0045 Epoch: 31 Global Step: 179160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:47,115-Speed 11267.88 samples/sec Loss 4.5346 LearningRate 0.0045 Epoch: 31 Global Step: 179170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:48,036-Speed 11130.50 samples/sec Loss 4.6592 LearningRate 0.0045 Epoch: 31 Global Step: 179180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:48,974-Speed 10925.85 samples/sec Loss 4.5133 LearningRate 0.0045 Epoch: 31 Global Step: 179190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:49,899-Speed 11076.59 samples/sec Loss 4.6558 LearningRate 0.0045 Epoch: 31 Global Step: 179200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:50,855-Speed 10713.41 samples/sec Loss 4.6621 LearningRate 0.0045 Epoch: 31 Global Step: 179210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:35:51,807-Speed 10759.27 samples/sec Loss 4.5634 LearningRate 0.0045 Epoch: 31 Global Step: 179220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:52,774-Speed 10602.13 samples/sec Loss 4.6696 LearningRate 0.0045 Epoch: 31 Global Step: 179230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:53,688-Speed 11206.04 samples/sec Loss 4.7091 LearningRate 0.0045 Epoch: 31 Global Step: 179240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:54,636-Speed 10816.24 samples/sec Loss 4.7434 LearningRate 0.0045 Epoch: 31 Global Step: 179250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:55,644-Speed 10161.69 samples/sec Loss 4.6306 LearningRate 0.0045 Epoch: 31 Global Step: 179260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:56,596-Speed 10762.80 samples/sec Loss 4.7031 LearningRate 0.0045 Epoch: 31 Global Step: 179270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:57,569-Speed 10536.06 samples/sec Loss 4.6080 LearningRate 0.0045 Epoch: 31 Global Step: 179280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:58,480-Speed 11245.66 samples/sec Loss 4.7521 LearningRate 0.0045 Epoch: 31 Global Step: 179290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:35:59,392-Speed 11246.40 samples/sec Loss 4.6472 LearningRate 0.0045 Epoch: 31 Global Step: 179300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:00,347-Speed 10731.25 samples/sec Loss 4.5311 LearningRate 0.0045 Epoch: 31 Global Step: 179310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:01,300-Speed 10747.67 samples/sec Loss 4.6346 LearningRate 0.0045 Epoch: 31 Global Step: 179320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:02,243-Speed 10873.35 samples/sec Loss 4.6371 LearningRate 0.0045 Epoch: 31 Global Step: 179330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:03,235-Speed 10327.85 samples/sec Loss 4.7743 LearningRate 0.0045 Epoch: 31 Global Step: 179340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:04,182-Speed 10810.45 samples/sec Loss 4.6964 LearningRate 0.0045 Epoch: 31 Global Step: 179350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:05,109-Speed 11053.97 samples/sec Loss 4.7025 LearningRate 0.0045 Epoch: 31 Global Step: 179360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:06,020-Speed 11251.72 samples/sec Loss 4.7206 LearningRate 0.0045 Epoch: 31 Global Step: 179370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:06,975-Speed 10736.50 samples/sec Loss 4.5246 LearningRate 0.0045 Epoch: 31 Global Step: 179380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:07,885-Speed 11253.78 samples/sec Loss 4.5811 LearningRate 0.0045 Epoch: 31 Global Step: 179390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:08,822-Speed 10943.31 samples/sec Loss 4.6732 LearningRate 0.0045 Epoch: 31 Global Step: 179400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:09,747-Speed 11072.01 samples/sec Loss 4.6265 LearningRate 0.0045 Epoch: 31 Global Step: 179410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:10,667-Speed 11136.28 samples/sec Loss 4.6955 LearningRate 0.0045 Epoch: 31 Global Step: 179420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:11,606-Speed 10915.06 samples/sec Loss 4.7170 LearningRate 0.0045 Epoch: 31 Global Step: 179430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:12,536-Speed 11016.53 samples/sec Loss 4.6113 LearningRate 0.0045 Epoch: 31 Global Step: 179440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:13,442-Speed 11315.24 samples/sec Loss 4.6260 LearningRate 0.0045 Epoch: 31 Global Step: 179450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:14,358-Speed 11202.71 samples/sec Loss 4.7384 LearningRate 0.0045 Epoch: 31 Global Step: 179460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:15,301-Speed 10866.31 samples/sec Loss 4.6532 LearningRate 0.0044 Epoch: 31 Global Step: 179470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:16,239-Speed 10928.90 samples/sec Loss 4.5858 LearningRate 0.0044 Epoch: 31 Global Step: 179480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:17,180-Speed 10894.23 samples/sec Loss 4.6433 LearningRate 0.0044 Epoch: 31 Global Step: 179490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:18,176-Speed 10286.52 samples/sec Loss 4.8425 LearningRate 0.0044 Epoch: 31 Global Step: 179500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:19,077-Speed 11370.60 samples/sec Loss 4.7180 LearningRate 0.0044 Epoch: 31 Global Step: 179510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:19,982-Speed 11330.68 samples/sec Loss 4.6100 LearningRate 0.0044 Epoch: 31 Global Step: 179520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:20,936-Speed 10738.07 samples/sec Loss 4.5942 LearningRate 0.0044 Epoch: 31 Global Step: 179530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:21,854-Speed 11155.87 samples/sec Loss 4.6681 LearningRate 0.0044 Epoch: 31 Global Step: 179540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:22,766-Speed 11244.18 samples/sec Loss 4.7104 LearningRate 0.0044 Epoch: 31 Global Step: 179550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:23,683-Speed 11174.15 samples/sec Loss 4.6999 LearningRate 0.0044 Epoch: 31 Global Step: 179560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:24,612-Speed 11031.77 samples/sec Loss 4.5446 LearningRate 0.0044 Epoch: 31 Global Step: 179570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:25,516-Speed 11337.11 samples/sec Loss 4.5171 LearningRate 0.0044 Epoch: 31 Global Step: 179580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:26,436-Speed 11135.46 samples/sec Loss 4.7523 LearningRate 0.0044 Epoch: 31 Global Step: 179590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:27,413-Speed 10490.66 samples/sec Loss 4.7033 LearningRate 0.0044 Epoch: 31 Global Step: 179600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:28,368-Speed 10726.56 samples/sec Loss 4.7532 LearningRate 0.0044 Epoch: 31 Global Step: 179610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:29,317-Speed 10801.73 samples/sec Loss 4.5344 LearningRate 0.0044 Epoch: 31 Global Step: 179620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:30,250-Speed 10977.28 samples/sec Loss 4.6953 LearningRate 0.0044 Epoch: 31 Global Step: 179630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:31,188-Speed 10925.13 samples/sec Loss 4.7718 LearningRate 0.0044 Epoch: 31 Global Step: 179640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:32,103-Speed 11195.78 samples/sec Loss 4.6367 LearningRate 0.0044 Epoch: 31 Global Step: 179650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:33,020-Speed 11177.24 samples/sec Loss 4.6190 LearningRate 0.0044 Epoch: 31 Global Step: 179660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:33,957-Speed 10932.76 samples/sec Loss 4.7156 LearningRate 0.0044 Epoch: 31 Global Step: 179670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:34,864-Speed 11300.94 samples/sec Loss 4.5596 LearningRate 0.0044 Epoch: 31 Global Step: 179680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:35,810-Speed 10831.13 samples/sec Loss 4.7098 LearningRate 0.0044 Epoch: 31 Global Step: 179690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:36,780-Speed 10561.64 samples/sec Loss 4.6555 LearningRate 0.0044 Epoch: 31 Global Step: 179700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:37,748-Speed 10590.32 samples/sec Loss 4.5225 LearningRate 0.0044 Epoch: 31 Global Step: 179710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:38,744-Speed 10285.59 samples/sec Loss 4.6542 LearningRate 0.0044 Epoch: 31 Global Step: 179720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:39,724-Speed 10463.14 samples/sec Loss 4.8657 LearningRate 0.0044 Epoch: 31 Global Step: 179730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:40,650-Speed 11065.18 samples/sec Loss 4.5903 LearningRate 0.0044 Epoch: 31 Global Step: 179740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:41,601-Speed 10774.81 samples/sec Loss 4.6293 LearningRate 0.0044 Epoch: 31 Global Step: 179750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:42,559-Speed 10694.43 samples/sec Loss 4.6624 LearningRate 0.0044 Epoch: 31 Global Step: 179760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:43,541-Speed 10430.14 samples/sec Loss 4.7212 LearningRate 0.0044 Epoch: 31 Global Step: 179770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:36:44,483-Speed 10886.33 samples/sec Loss 4.6696 LearningRate 0.0044 Epoch: 31 Global Step: 179780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:45,414-Speed 11001.40 samples/sec Loss 4.6831 LearningRate 0.0044 Epoch: 31 Global Step: 179790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:46,381-Speed 10600.92 samples/sec Loss 4.7470 LearningRate 0.0044 Epoch: 31 Global Step: 179800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:47,317-Speed 10948.59 samples/sec Loss 4.7538 LearningRate 0.0044 Epoch: 31 Global Step: 179810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:48,229-Speed 11226.14 samples/sec Loss 4.5233 LearningRate 0.0044 Epoch: 31 Global Step: 179820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:49,183-Speed 10750.80 samples/sec Loss 4.6513 LearningRate 0.0044 Epoch: 31 Global Step: 179830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:50,100-Speed 11171.01 samples/sec Loss 4.6700 LearningRate 0.0044 Epoch: 31 Global Step: 179840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:51,033-Speed 10976.21 samples/sec Loss 4.6087 LearningRate 0.0044 Epoch: 31 Global Step: 179850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:51,987-Speed 10746.82 samples/sec Loss 4.7488 LearningRate 0.0044 Epoch: 31 Global Step: 179860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:52,931-Speed 10852.56 samples/sec Loss 4.7013 LearningRate 0.0044 Epoch: 31 Global Step: 179870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:36:53,874-Speed 10866.34 samples/sec Loss 4.6242 LearningRate 0.0044 Epoch: 31 Global Step: 179880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:54,825-Speed 10771.13 samples/sec Loss 4.7815 LearningRate 0.0044 Epoch: 31 Global Step: 179890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:55,760-Speed 10968.47 samples/sec Loss 4.6941 LearningRate 0.0044 Epoch: 31 Global Step: 179900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:56,697-Speed 10931.96 samples/sec Loss 4.6926 LearningRate 0.0044 Epoch: 31 Global Step: 179910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:57,623-Speed 11070.97 samples/sec Loss 4.6525 LearningRate 0.0044 Epoch: 31 Global Step: 179920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:58,553-Speed 11012.60 samples/sec Loss 4.8500 LearningRate 0.0044 Epoch: 31 Global Step: 179930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:36:59,490-Speed 10939.52 samples/sec Loss 4.6776 LearningRate 0.0044 Epoch: 31 Global Step: 179940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:37:00,421-Speed 10997.71 samples/sec Loss 4.6113 LearningRate 0.0044 Epoch: 31 Global Step: 179950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:37:01,360-Speed 10920.94 samples/sec Loss 4.6445 LearningRate 0.0044 Epoch: 31 Global Step: 179960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:37:02,286-Speed 11076.99 samples/sec Loss 4.7268 LearningRate 0.0044 Epoch: 31 Global Step: 179970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:37:03,202-Speed 11189.33 samples/sec Loss 4.6540 LearningRate 0.0044 Epoch: 31 Global Step: 179980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:37:04,136-Speed 10969.89 samples/sec Loss 4.5490 LearningRate 0.0044 Epoch: 31 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:37:05,058-Speed 11115.72 samples/sec Loss 4.5278 LearningRate 0.0044 Epoch: 31 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:37:26,373-[lfw][180000]XNorm: 7.996346 Training: 2022-04-11 05:37:26,374-[lfw][180000]Accuracy-Flip: 0.99683+-0.00283 Training: 2022-04-11 05:37:26,374-[lfw][180000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:37:50,975-[cfp_fp][180000]XNorm: 6.830238 Training: 2022-04-11 05:37:50,976-[cfp_fp][180000]Accuracy-Flip: 0.95314+-0.00987 Training: 2022-04-11 05:37:50,976-[cfp_fp][180000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:38:12,266-[agedb_30][180000]XNorm: 7.836487 Training: 2022-04-11 05:38:12,266-[agedb_30][180000]Accuracy-Flip: 0.97067+-0.00684 Training: 2022-04-11 05:38:12,267-[agedb_30][180000]Accuracy-Highest: 0.97067 Training: 2022-04-11 05:38:13,182-Speed 150.31 samples/sec Loss 4.6167 LearningRate 0.0043 Epoch: 31 Global Step: 180010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:14,105-Speed 11098.39 samples/sec Loss 4.6195 LearningRate 0.0043 Epoch: 31 Global Step: 180020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:15,034-Speed 11039.19 samples/sec Loss 4.5346 LearningRate 0.0043 Epoch: 31 Global Step: 180030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:15,945-Speed 11240.94 samples/sec Loss 4.6743 LearningRate 0.0043 Epoch: 31 Global Step: 180040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:16,841-Speed 11430.99 samples/sec Loss 4.6289 LearningRate 0.0043 Epoch: 31 Global Step: 180050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:17,754-Speed 11228.22 samples/sec Loss 4.6238 LearningRate 0.0043 Epoch: 31 Global Step: 180060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:18,678-Speed 11084.63 samples/sec Loss 4.7294 LearningRate 0.0043 Epoch: 31 Global Step: 180070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:19,587-Speed 11272.61 samples/sec Loss 4.7304 LearningRate 0.0043 Epoch: 31 Global Step: 180080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:20,498-Speed 11255.45 samples/sec Loss 4.7678 LearningRate 0.0043 Epoch: 31 Global Step: 180090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:21,389-Speed 11499.13 samples/sec Loss 4.6881 LearningRate 0.0043 Epoch: 31 Global Step: 180100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:22,299-Speed 11256.93 samples/sec Loss 4.5979 LearningRate 0.0043 Epoch: 31 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:23,208-Speed 11276.91 samples/sec Loss 4.7227 LearningRate 0.0043 Epoch: 31 Global Step: 180120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:24,162-Speed 10742.84 samples/sec Loss 4.6506 LearningRate 0.0043 Epoch: 31 Global Step: 180130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:25,069-Speed 11294.74 samples/sec Loss 4.6179 LearningRate 0.0043 Epoch: 31 Global Step: 180140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:25,969-Speed 11385.83 samples/sec Loss 4.6926 LearningRate 0.0043 Epoch: 31 Global Step: 180150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:26,900-Speed 11000.81 samples/sec Loss 4.6795 LearningRate 0.0043 Epoch: 31 Global Step: 180160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:27,871-Speed 10555.38 samples/sec Loss 4.7048 LearningRate 0.0043 Epoch: 31 Global Step: 180170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:28,838-Speed 10600.37 samples/sec Loss 4.5266 LearningRate 0.0043 Epoch: 31 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:29,733-Speed 11449.90 samples/sec Loss 4.7424 LearningRate 0.0043 Epoch: 31 Global Step: 180190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:30,658-Speed 11071.56 samples/sec Loss 4.6616 LearningRate 0.0043 Epoch: 31 Global Step: 180200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:31,580-Speed 11110.38 samples/sec Loss 4.6743 LearningRate 0.0043 Epoch: 31 Global Step: 180210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:32,517-Speed 10935.55 samples/sec Loss 4.5967 LearningRate 0.0043 Epoch: 31 Global Step: 180220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:33,465-Speed 10816.97 samples/sec Loss 4.6079 LearningRate 0.0043 Epoch: 31 Global Step: 180230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:34,411-Speed 10830.69 samples/sec Loss 4.7059 LearningRate 0.0043 Epoch: 31 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:35,352-Speed 10889.27 samples/sec Loss 4.8080 LearningRate 0.0043 Epoch: 31 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:36,308-Speed 10724.06 samples/sec Loss 4.6031 LearningRate 0.0043 Epoch: 31 Global Step: 180260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:37,224-Speed 11185.43 samples/sec Loss 4.7111 LearningRate 0.0043 Epoch: 31 Global Step: 180270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:38,160-Speed 10937.82 samples/sec Loss 4.6295 LearningRate 0.0043 Epoch: 31 Global Step: 180280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:39,051-Speed 11504.07 samples/sec Loss 4.7165 LearningRate 0.0043 Epoch: 31 Global Step: 180290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:39,954-Speed 11352.42 samples/sec Loss 4.5741 LearningRate 0.0043 Epoch: 31 Global Step: 180300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:40,875-Speed 11118.21 samples/sec Loss 4.6579 LearningRate 0.0043 Epoch: 31 Global Step: 180310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:41,805-Speed 11037.58 samples/sec Loss 4.7317 LearningRate 0.0043 Epoch: 31 Global Step: 180320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:42,735-Speed 11019.28 samples/sec Loss 4.7456 LearningRate 0.0043 Epoch: 31 Global Step: 180330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:43,684-Speed 10796.17 samples/sec Loss 4.6815 LearningRate 0.0043 Epoch: 31 Global Step: 180340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:44,632-Speed 10811.47 samples/sec Loss 4.6967 LearningRate 0.0043 Epoch: 31 Global Step: 180350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:45,565-Speed 10983.87 samples/sec Loss 4.7329 LearningRate 0.0043 Epoch: 31 Global Step: 180360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:46,515-Speed 10781.79 samples/sec Loss 4.6062 LearningRate 0.0043 Epoch: 31 Global Step: 180370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:47,421-Speed 11315.06 samples/sec Loss 4.7442 LearningRate 0.0043 Epoch: 31 Global Step: 180380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:48,330-Speed 11273.15 samples/sec Loss 4.5831 LearningRate 0.0043 Epoch: 31 Global Step: 180390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:49,278-Speed 10800.26 samples/sec Loss 4.6325 LearningRate 0.0043 Epoch: 31 Global Step: 180400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:50,227-Speed 10800.47 samples/sec Loss 4.6750 LearningRate 0.0043 Epoch: 31 Global Step: 180410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:51,131-Speed 11339.48 samples/sec Loss 4.6968 LearningRate 0.0043 Epoch: 31 Global Step: 180420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:52,040-Speed 11270.06 samples/sec Loss 4.6991 LearningRate 0.0043 Epoch: 31 Global Step: 180430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:52,944-Speed 11334.56 samples/sec Loss 4.8357 LearningRate 0.0043 Epoch: 31 Global Step: 180440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:38:53,884-Speed 10904.06 samples/sec Loss 4.6176 LearningRate 0.0043 Epoch: 31 Global Step: 180450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:54,799-Speed 11197.63 samples/sec Loss 4.6384 LearningRate 0.0043 Epoch: 31 Global Step: 180460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:55,751-Speed 10761.85 samples/sec Loss 4.6337 LearningRate 0.0043 Epoch: 31 Global Step: 180470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:56,660-Speed 11274.49 samples/sec Loss 4.5557 LearningRate 0.0043 Epoch: 31 Global Step: 180480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:38:57,562-Speed 11356.73 samples/sec Loss 4.6510 LearningRate 0.0043 Epoch: 31 Global Step: 180490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:58,529-Speed 10592.20 samples/sec Loss 4.7385 LearningRate 0.0043 Epoch: 31 Global Step: 180500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:38:59,519-Speed 10348.90 samples/sec Loss 4.8050 LearningRate 0.0043 Epoch: 31 Global Step: 180510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:00,479-Speed 10683.33 samples/sec Loss 4.6712 LearningRate 0.0043 Epoch: 31 Global Step: 180520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:01,407-Speed 11036.02 samples/sec Loss 4.6649 LearningRate 0.0043 Epoch: 31 Global Step: 180530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:02,354-Speed 10819.41 samples/sec Loss 4.6857 LearningRate 0.0043 Epoch: 31 Global Step: 180540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:03,265-Speed 11250.06 samples/sec Loss 4.6429 LearningRate 0.0043 Epoch: 31 Global Step: 180550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:04,209-Speed 10862.77 samples/sec Loss 4.6464 LearningRate 0.0042 Epoch: 31 Global Step: 180560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:05,203-Speed 10304.86 samples/sec Loss 4.7533 LearningRate 0.0042 Epoch: 31 Global Step: 180570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:06,193-Speed 10356.20 samples/sec Loss 4.5470 LearningRate 0.0042 Epoch: 31 Global Step: 180580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:07,115-Speed 11115.43 samples/sec Loss 4.5727 LearningRate 0.0042 Epoch: 31 Global Step: 180590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:08,062-Speed 10819.39 samples/sec Loss 4.7885 LearningRate 0.0042 Epoch: 31 Global Step: 180600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:08,990-Speed 11044.05 samples/sec Loss 4.7759 LearningRate 0.0042 Epoch: 31 Global Step: 180610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:09,916-Speed 11065.25 samples/sec Loss 4.5143 LearningRate 0.0042 Epoch: 31 Global Step: 180620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:10,817-Speed 11374.01 samples/sec Loss 4.6469 LearningRate 0.0042 Epoch: 31 Global Step: 180630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:11,728-Speed 11247.29 samples/sec Loss 4.7458 LearningRate 0.0042 Epoch: 31 Global Step: 180640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:12,639-Speed 11245.81 samples/sec Loss 4.8336 LearningRate 0.0042 Epoch: 31 Global Step: 180650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:13,558-Speed 11152.18 samples/sec Loss 4.8495 LearningRate 0.0042 Epoch: 31 Global Step: 180660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:14,473-Speed 11188.67 samples/sec Loss 4.6205 LearningRate 0.0042 Epoch: 31 Global Step: 180670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:15,408-Speed 10969.91 samples/sec Loss 4.5936 LearningRate 0.0042 Epoch: 31 Global Step: 180680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:16,303-Speed 11448.60 samples/sec Loss 4.8642 LearningRate 0.0042 Epoch: 31 Global Step: 180690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:17,292-Speed 10354.38 samples/sec Loss 4.7088 LearningRate 0.0042 Epoch: 31 Global Step: 180700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:18,270-Speed 10481.73 samples/sec Loss 4.6959 LearningRate 0.0042 Epoch: 31 Global Step: 180710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:19,230-Speed 10670.88 samples/sec Loss 4.7216 LearningRate 0.0042 Epoch: 31 Global Step: 180720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:20,149-Speed 11150.04 samples/sec Loss 4.6624 LearningRate 0.0042 Epoch: 31 Global Step: 180730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:21,044-Speed 11454.12 samples/sec Loss 4.7413 LearningRate 0.0042 Epoch: 31 Global Step: 180740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:21,945-Speed 11380.14 samples/sec Loss 4.7849 LearningRate 0.0042 Epoch: 31 Global Step: 180750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:22,855-Speed 11254.31 samples/sec Loss 4.7325 LearningRate 0.0042 Epoch: 31 Global Step: 180760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:23,783-Speed 11045.24 samples/sec Loss 4.6075 LearningRate 0.0042 Epoch: 31 Global Step: 180770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:24,683-Speed 11382.82 samples/sec Loss 4.7580 LearningRate 0.0042 Epoch: 31 Global Step: 180780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:25,613-Speed 11026.62 samples/sec Loss 4.7513 LearningRate 0.0042 Epoch: 31 Global Step: 180790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:26,580-Speed 10598.76 samples/sec Loss 4.7271 LearningRate 0.0042 Epoch: 31 Global Step: 180800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:27,533-Speed 10749.46 samples/sec Loss 4.7448 LearningRate 0.0042 Epoch: 31 Global Step: 180810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:28,481-Speed 10807.05 samples/sec Loss 4.7662 LearningRate 0.0042 Epoch: 31 Global Step: 180820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:29,430-Speed 10793.17 samples/sec Loss 4.6552 LearningRate 0.0042 Epoch: 31 Global Step: 180830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:30,340-Speed 11258.64 samples/sec Loss 4.7172 LearningRate 0.0042 Epoch: 31 Global Step: 180840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:31,268-Speed 11042.10 samples/sec Loss 4.5054 LearningRate 0.0042 Epoch: 31 Global Step: 180850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:32,207-Speed 10913.65 samples/sec Loss 4.7274 LearningRate 0.0042 Epoch: 31 Global Step: 180860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:33,113-Speed 11310.42 samples/sec Loss 4.5825 LearningRate 0.0042 Epoch: 31 Global Step: 180870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:34,009-Speed 11433.09 samples/sec Loss 4.5620 LearningRate 0.0042 Epoch: 31 Global Step: 180880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:34,938-Speed 11032.43 samples/sec Loss 4.6593 LearningRate 0.0042 Epoch: 31 Global Step: 180890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:35,837-Speed 11404.57 samples/sec Loss 4.6870 LearningRate 0.0042 Epoch: 31 Global Step: 180900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:36,780-Speed 10858.07 samples/sec Loss 4.5101 LearningRate 0.0042 Epoch: 31 Global Step: 180910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:37,751-Speed 10551.17 samples/sec Loss 4.7157 LearningRate 0.0042 Epoch: 31 Global Step: 180920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:38,761-Speed 10146.35 samples/sec Loss 4.7548 LearningRate 0.0042 Epoch: 31 Global Step: 180930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:39,719-Speed 10693.71 samples/sec Loss 4.7378 LearningRate 0.0042 Epoch: 31 Global Step: 180940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:40,659-Speed 10902.15 samples/sec Loss 4.6837 LearningRate 0.0042 Epoch: 31 Global Step: 180950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:41,567-Speed 11293.57 samples/sec Loss 4.6513 LearningRate 0.0042 Epoch: 31 Global Step: 180960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:42,499-Speed 10990.61 samples/sec Loss 4.7362 LearningRate 0.0042 Epoch: 31 Global Step: 180970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:43,444-Speed 10841.60 samples/sec Loss 4.6601 LearningRate 0.0042 Epoch: 31 Global Step: 180980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:44,426-Speed 10435.68 samples/sec Loss 4.6957 LearningRate 0.0042 Epoch: 31 Global Step: 180990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:45,332-Speed 11315.70 samples/sec Loss 4.7771 LearningRate 0.0042 Epoch: 31 Global Step: 181000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:46,240-Speed 11285.12 samples/sec Loss 4.7787 LearningRate 0.0042 Epoch: 31 Global Step: 181010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:47,158-Speed 11165.28 samples/sec Loss 4.6392 LearningRate 0.0042 Epoch: 31 Global Step: 181020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:48,075-Speed 11179.38 samples/sec Loss 4.7399 LearningRate 0.0042 Epoch: 31 Global Step: 181030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:48,975-Speed 11380.39 samples/sec Loss 4.6870 LearningRate 0.0042 Epoch: 31 Global Step: 181040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:49,887-Speed 11240.69 samples/sec Loss 4.6703 LearningRate 0.0042 Epoch: 31 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:50,800-Speed 11215.85 samples/sec Loss 4.6823 LearningRate 0.0042 Epoch: 31 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:51,766-Speed 10612.68 samples/sec Loss 4.6948 LearningRate 0.0042 Epoch: 31 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:52,686-Speed 11131.60 samples/sec Loss 4.6949 LearningRate 0.0042 Epoch: 31 Global Step: 181080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:53,604-Speed 11174.39 samples/sec Loss 4.7902 LearningRate 0.0042 Epoch: 31 Global Step: 181090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:54,574-Speed 10566.12 samples/sec Loss 4.7493 LearningRate 0.0042 Epoch: 31 Global Step: 181100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:55,523-Speed 10794.14 samples/sec Loss 4.5933 LearningRate 0.0041 Epoch: 31 Global Step: 181110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:56,469-Speed 10834.35 samples/sec Loss 4.4971 LearningRate 0.0041 Epoch: 31 Global Step: 181120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:57,367-Speed 11402.02 samples/sec Loss 4.6118 LearningRate 0.0041 Epoch: 31 Global Step: 181130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:39:58,282-Speed 11205.85 samples/sec Loss 4.6459 LearningRate 0.0041 Epoch: 31 Global Step: 181140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:39:59,182-Speed 11379.49 samples/sec Loss 4.6305 LearningRate 0.0041 Epoch: 31 Global Step: 181150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:00,081-Speed 11395.63 samples/sec Loss 4.7388 LearningRate 0.0041 Epoch: 31 Global Step: 181160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:00,992-Speed 11248.89 samples/sec Loss 4.8348 LearningRate 0.0041 Epoch: 31 Global Step: 181170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:01,943-Speed 10778.99 samples/sec Loss 4.7000 LearningRate 0.0041 Epoch: 31 Global Step: 181180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:02,892-Speed 10800.50 samples/sec Loss 4.6427 LearningRate 0.0041 Epoch: 31 Global Step: 181190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:03,855-Speed 10632.03 samples/sec Loss 4.6730 LearningRate 0.0041 Epoch: 31 Global Step: 181200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:04,837-Speed 10442.50 samples/sec Loss 4.7599 LearningRate 0.0041 Epoch: 31 Global Step: 181210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:05,810-Speed 10537.69 samples/sec Loss 4.7152 LearningRate 0.0041 Epoch: 31 Global Step: 181220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:06,769-Speed 10686.08 samples/sec Loss 4.7946 LearningRate 0.0041 Epoch: 31 Global Step: 181230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:07,681-Speed 11238.36 samples/sec Loss 4.8450 LearningRate 0.0041 Epoch: 31 Global Step: 181240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:08,589-Speed 11287.18 samples/sec Loss 4.6795 LearningRate 0.0041 Epoch: 31 Global Step: 181250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:09,487-Speed 11401.83 samples/sec Loss 4.5648 LearningRate 0.0041 Epoch: 31 Global Step: 181260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:10,386-Speed 11400.61 samples/sec Loss 4.6684 LearningRate 0.0041 Epoch: 31 Global Step: 181270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:11,297-Speed 11253.28 samples/sec Loss 4.8853 LearningRate 0.0041 Epoch: 31 Global Step: 181280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:12,214-Speed 11175.52 samples/sec Loss 4.6342 LearningRate 0.0041 Epoch: 31 Global Step: 181290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:13,160-Speed 10826.39 samples/sec Loss 4.6920 LearningRate 0.0041 Epoch: 31 Global Step: 181300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:14,089-Speed 11036.08 samples/sec Loss 4.7326 LearningRate 0.0041 Epoch: 31 Global Step: 181310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:15,017-Speed 11031.57 samples/sec Loss 4.7415 LearningRate 0.0041 Epoch: 31 Global Step: 181320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:15,976-Speed 10689.31 samples/sec Loss 4.6952 LearningRate 0.0041 Epoch: 31 Global Step: 181330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:16,923-Speed 10823.31 samples/sec Loss 4.7894 LearningRate 0.0041 Epoch: 31 Global Step: 181340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:17,922-Speed 10250.45 samples/sec Loss 4.7211 LearningRate 0.0041 Epoch: 31 Global Step: 181350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:18,902-Speed 10466.47 samples/sec Loss 4.8027 LearningRate 0.0041 Epoch: 31 Global Step: 181360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:19,861-Speed 10676.75 samples/sec Loss 4.6959 LearningRate 0.0041 Epoch: 31 Global Step: 181370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:20,767-Speed 11320.09 samples/sec Loss 4.6702 LearningRate 0.0041 Epoch: 31 Global Step: 181380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:21,677-Speed 11260.76 samples/sec Loss 4.6914 LearningRate 0.0041 Epoch: 31 Global Step: 181390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:22,598-Speed 11119.51 samples/sec Loss 4.6920 LearningRate 0.0041 Epoch: 31 Global Step: 181400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:23,511-Speed 11223.53 samples/sec Loss 4.5583 LearningRate 0.0041 Epoch: 31 Global Step: 181410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:24,462-Speed 10773.35 samples/sec Loss 4.7051 LearningRate 0.0041 Epoch: 31 Global Step: 181420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:25,386-Speed 11088.57 samples/sec Loss 4.7270 LearningRate 0.0041 Epoch: 31 Global Step: 181430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:26,302-Speed 11194.46 samples/sec Loss 4.5780 LearningRate 0.0041 Epoch: 31 Global Step: 181440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:27,221-Speed 11150.95 samples/sec Loss 4.6558 LearningRate 0.0041 Epoch: 31 Global Step: 181450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:28,131-Speed 11261.56 samples/sec Loss 4.6181 LearningRate 0.0041 Epoch: 31 Global Step: 181460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:40:29,078-Speed 10822.70 samples/sec Loss 4.5779 LearningRate 0.0041 Epoch: 31 Global Step: 181470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:30,023-Speed 10840.05 samples/sec Loss 4.7060 LearningRate 0.0041 Epoch: 31 Global Step: 181480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:30,985-Speed 10653.15 samples/sec Loss 4.6873 LearningRate 0.0041 Epoch: 31 Global Step: 181490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:31,942-Speed 10710.27 samples/sec Loss 4.7447 LearningRate 0.0041 Epoch: 31 Global Step: 181500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:32,873-Speed 11002.76 samples/sec Loss 4.6923 LearningRate 0.0041 Epoch: 31 Global Step: 181510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:33,783-Speed 11261.89 samples/sec Loss 4.8822 LearningRate 0.0041 Epoch: 31 Global Step: 181520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:34,696-Speed 11243.31 samples/sec Loss 4.7123 LearningRate 0.0041 Epoch: 31 Global Step: 181530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:35,601-Speed 11313.71 samples/sec Loss 4.6261 LearningRate 0.0041 Epoch: 31 Global Step: 181540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:36,537-Speed 10949.88 samples/sec Loss 4.8300 LearningRate 0.0041 Epoch: 31 Global Step: 181550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:37,489-Speed 10765.43 samples/sec Loss 4.6715 LearningRate 0.0041 Epoch: 31 Global Step: 181560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:38,416-Speed 11058.82 samples/sec Loss 4.7356 LearningRate 0.0041 Epoch: 31 Global Step: 181570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:39,362-Speed 10833.39 samples/sec Loss 4.6682 LearningRate 0.0041 Epoch: 31 Global Step: 181580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:40,282-Speed 11130.56 samples/sec Loss 4.7600 LearningRate 0.0041 Epoch: 31 Global Step: 181590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:41,226-Speed 10861.10 samples/sec Loss 4.5857 LearningRate 0.0041 Epoch: 31 Global Step: 181600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:42,158-Speed 10992.73 samples/sec Loss 4.6502 LearningRate 0.0041 Epoch: 31 Global Step: 181610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:43,101-Speed 10871.73 samples/sec Loss 4.7313 LearningRate 0.0041 Epoch: 31 Global Step: 181620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:44,055-Speed 10738.17 samples/sec Loss 4.6526 LearningRate 0.0041 Epoch: 31 Global Step: 181630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:44,968-Speed 11215.56 samples/sec Loss 4.7191 LearningRate 0.0041 Epoch: 31 Global Step: 181640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:45,887-Speed 11153.38 samples/sec Loss 4.6625 LearningRate 0.0041 Epoch: 31 Global Step: 181650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:46,818-Speed 11014.67 samples/sec Loss 4.7515 LearningRate 0.0041 Epoch: 31 Global Step: 181660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:47,738-Speed 11134.65 samples/sec Loss 4.8131 LearningRate 0.0040 Epoch: 31 Global Step: 181670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:48,678-Speed 10907.98 samples/sec Loss 4.7280 LearningRate 0.0040 Epoch: 31 Global Step: 181680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:49,619-Speed 10886.54 samples/sec Loss 4.6300 LearningRate 0.0040 Epoch: 31 Global Step: 181690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:50,541-Speed 11109.35 samples/sec Loss 4.7186 LearningRate 0.0040 Epoch: 31 Global Step: 181700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:51,482-Speed 10898.43 samples/sec Loss 4.6346 LearningRate 0.0040 Epoch: 31 Global Step: 181710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:52,393-Speed 11241.95 samples/sec Loss 4.7599 LearningRate 0.0040 Epoch: 31 Global Step: 181720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:53,328-Speed 10960.32 samples/sec Loss 4.7962 LearningRate 0.0040 Epoch: 31 Global Step: 181730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:54,302-Speed 10519.50 samples/sec Loss 4.6181 LearningRate 0.0040 Epoch: 31 Global Step: 181740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:55,291-Speed 10359.43 samples/sec Loss 4.6271 LearningRate 0.0040 Epoch: 31 Global Step: 181750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:56,256-Speed 10619.92 samples/sec Loss 4.6510 LearningRate 0.0040 Epoch: 31 Global Step: 181760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:57,197-Speed 10893.25 samples/sec Loss 4.7274 LearningRate 0.0040 Epoch: 31 Global Step: 181770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:40:58,132-Speed 10957.25 samples/sec Loss 4.8186 LearningRate 0.0040 Epoch: 31 Global Step: 181780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:40:59,093-Speed 10662.11 samples/sec Loss 4.7613 LearningRate 0.0040 Epoch: 31 Global Step: 181790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:00,018-Speed 11071.79 samples/sec Loss 4.7380 LearningRate 0.0040 Epoch: 31 Global Step: 181800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:00,983-Speed 10625.94 samples/sec Loss 4.6496 LearningRate 0.0040 Epoch: 31 Global Step: 181810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:01,927-Speed 10856.98 samples/sec Loss 4.8459 LearningRate 0.0040 Epoch: 31 Global Step: 181820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:02,881-Speed 10734.21 samples/sec Loss 4.7047 LearningRate 0.0040 Epoch: 31 Global Step: 181830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:03,794-Speed 11231.53 samples/sec Loss 4.7442 LearningRate 0.0040 Epoch: 31 Global Step: 181840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:04,746-Speed 10756.23 samples/sec Loss 4.7029 LearningRate 0.0040 Epoch: 31 Global Step: 181850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:05,670-Speed 11086.89 samples/sec Loss 4.7370 LearningRate 0.0040 Epoch: 31 Global Step: 181860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:06,589-Speed 11159.53 samples/sec Loss 4.6745 LearningRate 0.0040 Epoch: 31 Global Step: 181870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:41:07,548-Speed 10679.99 samples/sec Loss 4.8323 LearningRate 0.0040 Epoch: 31 Global Step: 181880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:08,506-Speed 10701.38 samples/sec Loss 4.7054 LearningRate 0.0040 Epoch: 31 Global Step: 181890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:09,430-Speed 11091.96 samples/sec Loss 4.6970 LearningRate 0.0040 Epoch: 31 Global Step: 181900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:10,339-Speed 11272.95 samples/sec Loss 4.5909 LearningRate 0.0040 Epoch: 31 Global Step: 181910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:11,244-Speed 11327.80 samples/sec Loss 4.7033 LearningRate 0.0040 Epoch: 31 Global Step: 181920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:12,167-Speed 11110.12 samples/sec Loss 4.7424 LearningRate 0.0040 Epoch: 31 Global Step: 181930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:13,089-Speed 11113.57 samples/sec Loss 4.8474 LearningRate 0.0040 Epoch: 31 Global Step: 181940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:14,115-Speed 9985.94 samples/sec Loss 4.7850 LearningRate 0.0040 Epoch: 31 Global Step: 181950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:24,580-Speed 978.62 samples/sec Loss 4.3333 LearningRate 0.0040 Epoch: 32 Global Step: 181960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:25,533-Speed 10756.00 samples/sec Loss 4.3856 LearningRate 0.0040 Epoch: 32 Global Step: 181970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:26,675-Speed 8965.65 samples/sec Loss 4.1936 LearningRate 0.0040 Epoch: 32 Global Step: 181980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:41:27,773-Speed 9331.32 samples/sec Loss 4.2235 LearningRate 0.0040 Epoch: 32 Global Step: 181990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:28,862-Speed 9412.53 samples/sec Loss 4.1864 LearningRate 0.0040 Epoch: 32 Global Step: 182000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:41:50,210-[lfw][182000]XNorm: 7.955034 Training: 2022-04-11 05:41:50,211-[lfw][182000]Accuracy-Flip: 0.99650+-0.00311 Training: 2022-04-11 05:41:50,211-[lfw][182000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:42:14,794-[cfp_fp][182000]XNorm: 6.784204 Training: 2022-04-11 05:42:14,794-[cfp_fp][182000]Accuracy-Flip: 0.95100+-0.01232 Training: 2022-04-11 05:42:14,794-[cfp_fp][182000]Accuracy-Highest: 0.95429 Training: 2022-04-11 05:42:35,968-[agedb_30][182000]XNorm: 7.777056 Training: 2022-04-11 05:42:35,968-[agedb_30][182000]Accuracy-Flip: 0.96900+-0.00817 Training: 2022-04-11 05:42:35,969-[agedb_30][182000]Accuracy-Highest: 0.97067 Training: 2022-04-11 05:42:36,958-Speed 150.38 samples/sec Loss 4.2970 LearningRate 0.0040 Epoch: 32 Global Step: 182010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:37,880-Speed 11115.63 samples/sec Loss 4.3990 LearningRate 0.0040 Epoch: 32 Global Step: 182020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:38,820-Speed 10898.21 samples/sec Loss 4.1629 LearningRate 0.0040 Epoch: 32 Global Step: 182030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:39,771-Speed 10783.72 samples/sec Loss 4.3356 LearningRate 0.0040 Epoch: 32 Global Step: 182040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:40,699-Speed 11038.70 samples/sec Loss 4.2893 LearningRate 0.0040 Epoch: 32 Global Step: 182050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:41,632-Speed 10980.26 samples/sec Loss 4.3564 LearningRate 0.0040 Epoch: 32 Global Step: 182060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:42,563-Speed 11008.90 samples/sec Loss 4.2147 LearningRate 0.0040 Epoch: 32 Global Step: 182070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:43,508-Speed 10840.83 samples/sec Loss 4.2881 LearningRate 0.0040 Epoch: 32 Global Step: 182080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:44,454-Speed 10838.78 samples/sec Loss 4.3167 LearningRate 0.0040 Epoch: 32 Global Step: 182090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:45,346-Speed 11481.29 samples/sec Loss 4.3317 LearningRate 0.0040 Epoch: 32 Global Step: 182100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:46,255-Speed 11274.71 samples/sec Loss 4.3294 LearningRate 0.0040 Epoch: 32 Global Step: 182110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:47,171-Speed 11188.24 samples/sec Loss 4.3762 LearningRate 0.0040 Epoch: 32 Global Step: 182120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:48,095-Speed 11087.10 samples/sec Loss 4.3106 LearningRate 0.0040 Epoch: 32 Global Step: 182130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:49,017-Speed 11109.59 samples/sec Loss 4.2579 LearningRate 0.0040 Epoch: 32 Global Step: 182140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:49,977-Speed 10676.26 samples/sec Loss 4.3789 LearningRate 0.0040 Epoch: 32 Global Step: 182150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:50,941-Speed 10630.84 samples/sec Loss 4.3252 LearningRate 0.0040 Epoch: 32 Global Step: 182160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:51,877-Speed 10948.01 samples/sec Loss 4.3190 LearningRate 0.0040 Epoch: 32 Global Step: 182170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:42:52,821-Speed 10853.28 samples/sec Loss 4.4013 LearningRate 0.0040 Epoch: 32 Global Step: 182180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:53,791-Speed 10563.31 samples/sec Loss 4.2120 LearningRate 0.0040 Epoch: 32 Global Step: 182190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:54,754-Speed 10639.89 samples/sec Loss 4.3006 LearningRate 0.0040 Epoch: 32 Global Step: 182200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:55,697-Speed 10867.44 samples/sec Loss 4.3224 LearningRate 0.0040 Epoch: 32 Global Step: 182210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:56,622-Speed 11078.47 samples/sec Loss 4.1945 LearningRate 0.0040 Epoch: 32 Global Step: 182220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:57,560-Speed 10924.95 samples/sec Loss 4.1965 LearningRate 0.0040 Epoch: 32 Global Step: 182230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:58,597-Speed 9881.09 samples/sec Loss 4.2678 LearningRate 0.0039 Epoch: 32 Global Step: 182240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:42:59,535-Speed 10929.80 samples/sec Loss 4.2307 LearningRate 0.0039 Epoch: 32 Global Step: 182250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:00,530-Speed 10295.48 samples/sec Loss 4.4138 LearningRate 0.0039 Epoch: 32 Global Step: 182260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:01,439-Speed 11275.84 samples/sec Loss 4.3286 LearningRate 0.0039 Epoch: 32 Global Step: 182270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:02,455-Speed 10082.75 samples/sec Loss 4.3306 LearningRate 0.0039 Epoch: 32 Global Step: 182280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:03,381-Speed 11069.22 samples/sec Loss 4.3286 LearningRate 0.0039 Epoch: 32 Global Step: 182290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:04,315-Speed 10970.69 samples/sec Loss 4.3794 LearningRate 0.0039 Epoch: 32 Global Step: 182300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:05,299-Speed 10418.91 samples/sec Loss 4.3216 LearningRate 0.0039 Epoch: 32 Global Step: 182310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:06,285-Speed 10391.37 samples/sec Loss 4.3966 LearningRate 0.0039 Epoch: 32 Global Step: 182320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:07,249-Speed 10626.54 samples/sec Loss 4.3447 LearningRate 0.0039 Epoch: 32 Global Step: 182330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:08,214-Speed 10611.57 samples/sec Loss 4.3942 LearningRate 0.0039 Epoch: 32 Global Step: 182340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:09,107-Speed 11475.81 samples/sec Loss 4.3050 LearningRate 0.0039 Epoch: 32 Global Step: 182350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:10,010-Speed 11348.75 samples/sec Loss 4.4719 LearningRate 0.0039 Epoch: 32 Global Step: 182360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:10,923-Speed 11230.41 samples/sec Loss 4.2894 LearningRate 0.0039 Epoch: 32 Global Step: 182370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:11,830-Speed 11291.95 samples/sec Loss 4.3004 LearningRate 0.0039 Epoch: 32 Global Step: 182380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:12,761-Speed 11003.98 samples/sec Loss 4.3365 LearningRate 0.0039 Epoch: 32 Global Step: 182390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:13,678-Speed 11177.60 samples/sec Loss 4.4208 LearningRate 0.0039 Epoch: 32 Global Step: 182400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:14,595-Speed 11170.44 samples/sec Loss 4.3169 LearningRate 0.0039 Epoch: 32 Global Step: 182410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:15,532-Speed 10936.18 samples/sec Loss 4.3693 LearningRate 0.0039 Epoch: 32 Global Step: 182420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:16,453-Speed 11132.90 samples/sec Loss 4.3826 LearningRate 0.0039 Epoch: 32 Global Step: 182430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:17,387-Speed 10973.27 samples/sec Loss 4.1989 LearningRate 0.0039 Epoch: 32 Global Step: 182440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:18,306-Speed 11146.87 samples/sec Loss 4.2793 LearningRate 0.0039 Epoch: 32 Global Step: 182450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:19,286-Speed 10454.85 samples/sec Loss 4.3964 LearningRate 0.0039 Epoch: 32 Global Step: 182460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:20,210-Speed 11101.80 samples/sec Loss 4.2330 LearningRate 0.0039 Epoch: 32 Global Step: 182470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:21,115-Speed 11318.12 samples/sec Loss 4.3047 LearningRate 0.0039 Epoch: 32 Global Step: 182480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:22,018-Speed 11345.89 samples/sec Loss 4.2301 LearningRate 0.0039 Epoch: 32 Global Step: 182490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:22,928-Speed 11256.05 samples/sec Loss 4.3376 LearningRate 0.0039 Epoch: 32 Global Step: 182500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:23,857-Speed 11037.04 samples/sec Loss 4.3933 LearningRate 0.0039 Epoch: 32 Global Step: 182510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:24,789-Speed 10989.65 samples/sec Loss 4.3972 LearningRate 0.0039 Epoch: 32 Global Step: 182520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:25,711-Speed 11117.17 samples/sec Loss 4.3812 LearningRate 0.0039 Epoch: 32 Global Step: 182530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:26,606-Speed 11450.17 samples/sec Loss 4.4146 LearningRate 0.0039 Epoch: 32 Global Step: 182540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:27,533-Speed 11059.37 samples/sec Loss 4.3614 LearningRate 0.0039 Epoch: 32 Global Step: 182550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:28,496-Speed 10638.43 samples/sec Loss 4.3090 LearningRate 0.0039 Epoch: 32 Global Step: 182560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:29,440-Speed 10847.62 samples/sec Loss 4.3873 LearningRate 0.0039 Epoch: 32 Global Step: 182570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:30,388-Speed 10819.61 samples/sec Loss 4.4203 LearningRate 0.0039 Epoch: 32 Global Step: 182580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:31,310-Speed 11103.71 samples/sec Loss 4.2495 LearningRate 0.0039 Epoch: 32 Global Step: 182590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:32,234-Speed 11091.79 samples/sec Loss 4.3255 LearningRate 0.0039 Epoch: 32 Global Step: 182600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:33,168-Speed 10974.52 samples/sec Loss 4.4057 LearningRate 0.0039 Epoch: 32 Global Step: 182610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:34,076-Speed 11286.28 samples/sec Loss 4.2646 LearningRate 0.0039 Epoch: 32 Global Step: 182620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:35,015-Speed 10918.06 samples/sec Loss 4.4551 LearningRate 0.0039 Epoch: 32 Global Step: 182630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:35,924-Speed 11261.93 samples/sec Loss 4.4002 LearningRate 0.0039 Epoch: 32 Global Step: 182640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:36,871-Speed 10830.87 samples/sec Loss 4.3163 LearningRate 0.0039 Epoch: 32 Global Step: 182650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:37,777-Speed 11302.47 samples/sec Loss 4.3897 LearningRate 0.0039 Epoch: 32 Global Step: 182660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:38,709-Speed 10997.27 samples/sec Loss 4.3347 LearningRate 0.0039 Epoch: 32 Global Step: 182670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:39,648-Speed 10917.91 samples/sec Loss 4.3694 LearningRate 0.0039 Epoch: 32 Global Step: 182680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:40,558-Speed 11266.55 samples/sec Loss 4.3403 LearningRate 0.0039 Epoch: 32 Global Step: 182690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:41,461-Speed 11339.80 samples/sec Loss 4.3787 LearningRate 0.0039 Epoch: 32 Global Step: 182700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:42,408-Speed 10824.82 samples/sec Loss 4.3161 LearningRate 0.0039 Epoch: 32 Global Step: 182710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:43,301-Speed 11465.88 samples/sec Loss 4.3725 LearningRate 0.0039 Epoch: 32 Global Step: 182720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:44,232-Speed 11009.37 samples/sec Loss 4.3044 LearningRate 0.0039 Epoch: 32 Global Step: 182730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:45,135-Speed 11357.79 samples/sec Loss 4.3740 LearningRate 0.0039 Epoch: 32 Global Step: 182740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:46,039-Speed 11325.96 samples/sec Loss 4.3863 LearningRate 0.0039 Epoch: 32 Global Step: 182750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:46,935-Speed 11443.39 samples/sec Loss 4.3288 LearningRate 0.0039 Epoch: 32 Global Step: 182760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:47,839-Speed 11328.08 samples/sec Loss 4.4621 LearningRate 0.0039 Epoch: 32 Global Step: 182770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:48,766-Speed 11058.80 samples/sec Loss 4.3460 LearningRate 0.0039 Epoch: 32 Global Step: 182780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:49,689-Speed 11096.81 samples/sec Loss 4.3516 LearningRate 0.0039 Epoch: 32 Global Step: 182790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:50,684-Speed 10301.10 samples/sec Loss 4.3226 LearningRate 0.0039 Epoch: 32 Global Step: 182800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:51,644-Speed 10673.52 samples/sec Loss 4.3958 LearningRate 0.0039 Epoch: 32 Global Step: 182810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:52,592-Speed 10808.10 samples/sec Loss 4.4244 LearningRate 0.0038 Epoch: 32 Global Step: 182820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:53,504-Speed 11231.22 samples/sec Loss 4.3867 LearningRate 0.0038 Epoch: 32 Global Step: 182830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:54,455-Speed 10777.51 samples/sec Loss 4.5430 LearningRate 0.0038 Epoch: 32 Global Step: 182840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:55,379-Speed 11095.81 samples/sec Loss 4.4327 LearningRate 0.0038 Epoch: 32 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:56,290-Speed 11244.39 samples/sec Loss 4.3980 LearningRate 0.0038 Epoch: 32 Global Step: 182860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:57,210-Speed 11137.94 samples/sec Loss 4.4538 LearningRate 0.0038 Epoch: 32 Global Step: 182870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:58,119-Speed 11273.44 samples/sec Loss 4.5592 LearningRate 0.0038 Epoch: 32 Global Step: 182880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:43:59,036-Speed 11172.36 samples/sec Loss 4.4487 LearningRate 0.0038 Epoch: 32 Global Step: 182890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:43:59,993-Speed 10711.87 samples/sec Loss 4.3327 LearningRate 0.0038 Epoch: 32 Global Step: 182900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:00,904-Speed 11246.49 samples/sec Loss 4.4140 LearningRate 0.0038 Epoch: 32 Global Step: 182910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:01,826-Speed 11113.16 samples/sec Loss 4.4864 LearningRate 0.0038 Epoch: 32 Global Step: 182920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:02,792-Speed 10609.18 samples/sec Loss 4.3922 LearningRate 0.0038 Epoch: 32 Global Step: 182930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:03,751-Speed 10683.78 samples/sec Loss 4.5181 LearningRate 0.0038 Epoch: 32 Global Step: 182940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:04,704-Speed 10748.04 samples/sec Loss 4.3216 LearningRate 0.0038 Epoch: 32 Global Step: 182950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:05,608-Speed 11385.64 samples/sec Loss 4.3386 LearningRate 0.0038 Epoch: 32 Global Step: 182960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:06,523-Speed 11196.11 samples/sec Loss 4.5125 LearningRate 0.0038 Epoch: 32 Global Step: 182970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:07,462-Speed 10909.78 samples/sec Loss 4.4473 LearningRate 0.0038 Epoch: 32 Global Step: 182980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:08,422-Speed 10675.34 samples/sec Loss 4.4820 LearningRate 0.0038 Epoch: 32 Global Step: 182990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:09,328-Speed 11307.01 samples/sec Loss 4.4126 LearningRate 0.0038 Epoch: 32 Global Step: 183000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:10,231-Speed 11350.82 samples/sec Loss 4.5118 LearningRate 0.0038 Epoch: 32 Global Step: 183010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:11,158-Speed 11063.22 samples/sec Loss 4.3799 LearningRate 0.0038 Epoch: 32 Global Step: 183020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:12,067-Speed 11266.47 samples/sec Loss 4.4276 LearningRate 0.0038 Epoch: 32 Global Step: 183030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:12,980-Speed 11217.36 samples/sec Loss 4.3682 LearningRate 0.0038 Epoch: 32 Global Step: 183040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:13,902-Speed 11117.58 samples/sec Loss 4.4490 LearningRate 0.0038 Epoch: 32 Global Step: 183050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:14,835-Speed 10984.82 samples/sec Loss 4.4033 LearningRate 0.0038 Epoch: 32 Global Step: 183060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:15,788-Speed 10752.99 samples/sec Loss 4.3766 LearningRate 0.0038 Epoch: 32 Global Step: 183070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:16,717-Speed 11031.33 samples/sec Loss 4.4105 LearningRate 0.0038 Epoch: 32 Global Step: 183080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:17,642-Speed 11082.78 samples/sec Loss 4.5430 LearningRate 0.0038 Epoch: 32 Global Step: 183090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:18,609-Speed 10594.44 samples/sec Loss 4.3389 LearningRate 0.0038 Epoch: 32 Global Step: 183100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:19,506-Speed 11432.15 samples/sec Loss 4.3365 LearningRate 0.0038 Epoch: 32 Global Step: 183110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:20,468-Speed 10650.26 samples/sec Loss 4.5247 LearningRate 0.0038 Epoch: 32 Global Step: 183120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:21,383-Speed 11196.65 samples/sec Loss 4.3991 LearningRate 0.0038 Epoch: 32 Global Step: 183130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:22,278-Speed 11445.62 samples/sec Loss 4.4636 LearningRate 0.0038 Epoch: 32 Global Step: 183140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:23,188-Speed 11258.75 samples/sec Loss 4.2908 LearningRate 0.0038 Epoch: 32 Global Step: 183150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:24,111-Speed 11105.79 samples/sec Loss 4.3652 LearningRate 0.0038 Epoch: 32 Global Step: 183160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:25,022-Speed 11247.01 samples/sec Loss 4.3742 LearningRate 0.0038 Epoch: 32 Global Step: 183170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:25,961-Speed 10915.65 samples/sec Loss 4.4434 LearningRate 0.0038 Epoch: 32 Global Step: 183180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:26,919-Speed 10697.07 samples/sec Loss 4.4146 LearningRate 0.0038 Epoch: 32 Global Step: 183190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:27,893-Speed 10518.11 samples/sec Loss 4.5353 LearningRate 0.0038 Epoch: 32 Global Step: 183200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:28,851-Speed 10700.88 samples/sec Loss 4.4449 LearningRate 0.0038 Epoch: 32 Global Step: 183210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:29,761-Speed 11256.43 samples/sec Loss 4.3057 LearningRate 0.0038 Epoch: 32 Global Step: 183220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:30,747-Speed 10389.30 samples/sec Loss 4.4791 LearningRate 0.0038 Epoch: 32 Global Step: 183230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:31,664-Speed 11177.10 samples/sec Loss 4.4775 LearningRate 0.0038 Epoch: 32 Global Step: 183240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:32,586-Speed 11109.64 samples/sec Loss 4.3406 LearningRate 0.0038 Epoch: 32 Global Step: 183250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:33,483-Speed 11432.62 samples/sec Loss 4.4913 LearningRate 0.0038 Epoch: 32 Global Step: 183260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:34,388-Speed 11322.12 samples/sec Loss 4.4572 LearningRate 0.0038 Epoch: 32 Global Step: 183270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:35,327-Speed 10907.98 samples/sec Loss 4.4564 LearningRate 0.0038 Epoch: 32 Global Step: 183280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:36,272-Speed 10845.09 samples/sec Loss 4.4796 LearningRate 0.0038 Epoch: 32 Global Step: 183290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:37,250-Speed 10484.99 samples/sec Loss 4.6203 LearningRate 0.0038 Epoch: 32 Global Step: 183300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:38,170-Speed 11130.16 samples/sec Loss 4.5236 LearningRate 0.0038 Epoch: 32 Global Step: 183310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:39,081-Speed 11257.40 samples/sec Loss 4.3958 LearningRate 0.0038 Epoch: 32 Global Step: 183320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:39,989-Speed 11274.87 samples/sec Loss 4.4037 LearningRate 0.0038 Epoch: 32 Global Step: 183330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:40,933-Speed 10866.26 samples/sec Loss 4.3987 LearningRate 0.0038 Epoch: 32 Global Step: 183340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:44:41,851-Speed 11154.06 samples/sec Loss 4.4094 LearningRate 0.0038 Epoch: 32 Global Step: 183350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:42,818-Speed 10599.72 samples/sec Loss 4.4040 LearningRate 0.0038 Epoch: 32 Global Step: 183360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:43,755-Speed 10932.80 samples/sec Loss 4.3855 LearningRate 0.0038 Epoch: 32 Global Step: 183370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:44,670-Speed 11200.17 samples/sec Loss 4.3931 LearningRate 0.0038 Epoch: 32 Global Step: 183380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:45,563-Speed 11483.72 samples/sec Loss 4.4750 LearningRate 0.0038 Epoch: 32 Global Step: 183390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:46,475-Speed 11236.56 samples/sec Loss 4.4543 LearningRate 0.0037 Epoch: 32 Global Step: 183400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:47,431-Speed 10717.96 samples/sec Loss 4.4295 LearningRate 0.0037 Epoch: 32 Global Step: 183410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:48,350-Speed 11147.38 samples/sec Loss 4.3989 LearningRate 0.0037 Epoch: 32 Global Step: 183420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:49,323-Speed 10534.45 samples/sec Loss 4.4962 LearningRate 0.0037 Epoch: 32 Global Step: 183430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:50,257-Speed 10968.64 samples/sec Loss 4.3932 LearningRate 0.0037 Epoch: 32 Global Step: 183440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:51,224-Speed 10599.75 samples/sec Loss 4.4138 LearningRate 0.0037 Epoch: 32 Global Step: 183450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-11 05:44:52,136-Speed 11232.41 samples/sec Loss 4.3986 LearningRate 0.0037 Epoch: 32 Global Step: 183460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:53,048-Speed 11242.78 samples/sec Loss 4.5058 LearningRate 0.0037 Epoch: 32 Global Step: 183470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:53,967-Speed 11141.32 samples/sec Loss 4.4359 LearningRate 0.0037 Epoch: 32 Global Step: 183480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:54,929-Speed 10656.48 samples/sec Loss 4.3100 LearningRate 0.0037 Epoch: 32 Global Step: 183490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:55,910-Speed 10448.64 samples/sec Loss 4.5467 LearningRate 0.0037 Epoch: 32 Global Step: 183500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:56,818-Speed 11279.10 samples/sec Loss 4.5899 LearningRate 0.0037 Epoch: 32 Global Step: 183510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:57,719-Speed 11375.03 samples/sec Loss 4.5120 LearningRate 0.0037 Epoch: 32 Global Step: 183520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:58,708-Speed 10365.51 samples/sec Loss 4.4682 LearningRate 0.0037 Epoch: 32 Global Step: 183530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:44:59,666-Speed 10688.69 samples/sec Loss 4.5150 LearningRate 0.0037 Epoch: 32 Global Step: 183540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:00,634-Speed 10593.87 samples/sec Loss 4.3799 LearningRate 0.0037 Epoch: 32 Global Step: 183550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:01,577-Speed 10856.24 samples/sec Loss 4.3554 LearningRate 0.0037 Epoch: 32 Global Step: 183560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:02,578-Speed 10236.32 samples/sec Loss 4.4178 LearningRate 0.0037 Epoch: 32 Global Step: 183570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:03,548-Speed 10569.47 samples/sec Loss 4.3495 LearningRate 0.0037 Epoch: 32 Global Step: 183580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:04,470-Speed 11111.13 samples/sec Loss 4.4190 LearningRate 0.0037 Epoch: 32 Global Step: 183590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:05,388-Speed 11175.05 samples/sec Loss 4.4357 LearningRate 0.0037 Epoch: 32 Global Step: 183600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:06,358-Speed 10565.64 samples/sec Loss 4.4479 LearningRate 0.0037 Epoch: 32 Global Step: 183610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:07,305-Speed 10815.66 samples/sec Loss 4.4820 LearningRate 0.0037 Epoch: 32 Global Step: 183620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:08,272-Speed 10591.00 samples/sec Loss 4.4897 LearningRate 0.0037 Epoch: 32 Global Step: 183630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:09,182-Speed 11262.19 samples/sec Loss 4.4131 LearningRate 0.0037 Epoch: 32 Global Step: 183640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:10,070-Speed 11547.47 samples/sec Loss 4.5272 LearningRate 0.0037 Epoch: 32 Global Step: 183650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:10,992-Speed 11114.82 samples/sec Loss 4.4498 LearningRate 0.0037 Epoch: 32 Global Step: 183660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 05:45:11,940-Speed 10806.08 samples/sec Loss 4.3808 LearningRate 0.0037 Epoch: 32 Global Step: 183670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:12,833-Speed 11480.34 samples/sec Loss 4.4118 LearningRate 0.0037 Epoch: 32 Global Step: 183680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:13,753-Speed 11135.10 samples/sec Loss 4.4652 LearningRate 0.0037 Epoch: 32 Global Step: 183690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:14,723-Speed 10564.52 samples/sec Loss 4.3803 LearningRate 0.0037 Epoch: 32 Global Step: 183700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:15,640-Speed 11167.02 samples/sec Loss 4.5399 LearningRate 0.0037 Epoch: 32 Global Step: 183710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:16,547-Speed 11300.84 samples/sec Loss 4.4426 LearningRate 0.0037 Epoch: 32 Global Step: 183720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:17,477-Speed 11016.93 samples/sec Loss 4.4088 LearningRate 0.0037 Epoch: 32 Global Step: 183730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:18,392-Speed 11195.99 samples/sec Loss 4.4035 LearningRate 0.0037 Epoch: 32 Global Step: 183740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:19,329-Speed 10945.72 samples/sec Loss 4.4576 LearningRate 0.0037 Epoch: 32 Global Step: 183750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:20,251-Speed 11109.92 samples/sec Loss 4.5971 LearningRate 0.0037 Epoch: 32 Global Step: 183760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:21,140-Speed 11521.68 samples/sec Loss 4.4151 LearningRate 0.0037 Epoch: 32 Global Step: 183770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:22,049-Speed 11271.66 samples/sec Loss 4.3189 LearningRate 0.0037 Epoch: 32 Global Step: 183780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:22,965-Speed 11193.02 samples/sec Loss 4.4460 LearningRate 0.0037 Epoch: 32 Global Step: 183790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:23,870-Speed 11323.79 samples/sec Loss 4.5463 LearningRate 0.0037 Epoch: 32 Global Step: 183800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:24,850-Speed 10454.92 samples/sec Loss 4.4439 LearningRate 0.0037 Epoch: 32 Global Step: 183810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:25,781-Speed 11004.63 samples/sec Loss 4.4261 LearningRate 0.0037 Epoch: 32 Global Step: 183820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:26,742-Speed 10665.10 samples/sec Loss 4.4715 LearningRate 0.0037 Epoch: 32 Global Step: 183830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:27,668-Speed 11069.24 samples/sec Loss 4.3864 LearningRate 0.0037 Epoch: 32 Global Step: 183840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:28,621-Speed 10757.19 samples/sec Loss 4.4621 LearningRate 0.0037 Epoch: 32 Global Step: 183850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:29,578-Speed 10703.34 samples/sec Loss 4.5394 LearningRate 0.0037 Epoch: 32 Global Step: 183860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:30,501-Speed 11110.82 samples/sec Loss 4.4130 LearningRate 0.0037 Epoch: 32 Global Step: 183870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:31,412-Speed 11247.20 samples/sec Loss 4.4308 LearningRate 0.0037 Epoch: 32 Global Step: 183880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:32,329-Speed 11172.20 samples/sec Loss 4.4627 LearningRate 0.0037 Epoch: 32 Global Step: 183890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:33,238-Speed 11272.37 samples/sec Loss 4.4495 LearningRate 0.0037 Epoch: 32 Global Step: 183900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:34,139-Speed 11367.83 samples/sec Loss 4.4412 LearningRate 0.0037 Epoch: 32 Global Step: 183910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:35,045-Speed 11310.93 samples/sec Loss 4.4555 LearningRate 0.0037 Epoch: 32 Global Step: 183920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:35,956-Speed 11250.87 samples/sec Loss 4.5076 LearningRate 0.0037 Epoch: 32 Global Step: 183930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:45:36,888-Speed 10988.88 samples/sec Loss 4.3975 LearningRate 0.0037 Epoch: 32 Global Step: 183940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:37,835-Speed 10819.56 samples/sec Loss 4.4713 LearningRate 0.0037 Epoch: 32 Global Step: 183950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:38,779-Speed 10855.73 samples/sec Loss 4.4806 LearningRate 0.0037 Epoch: 32 Global Step: 183960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:39,757-Speed 10474.28 samples/sec Loss 4.4283 LearningRate 0.0037 Epoch: 32 Global Step: 183970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:40,679-Speed 11123.78 samples/sec Loss 4.5289 LearningRate 0.0037 Epoch: 32 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:41,652-Speed 10525.85 samples/sec Loss 4.4726 LearningRate 0.0036 Epoch: 32 Global Step: 183990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:45:42,589-Speed 10944.03 samples/sec Loss 4.5739 LearningRate 0.0036 Epoch: 32 Global Step: 184000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:46:03,824-[lfw][184000]XNorm: 7.953078 Training: 2022-04-11 05:46:03,825-[lfw][184000]Accuracy-Flip: 0.99683+-0.00283 Training: 2022-04-11 05:46:03,825-[lfw][184000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:46:28,453-[cfp_fp][184000]XNorm: 6.787826 Training: 2022-04-11 05:46:28,453-[cfp_fp][184000]Accuracy-Flip: 0.95500+-0.01037 Training: 2022-04-11 05:46:28,454-[cfp_fp][184000]Accuracy-Highest: 0.95500 Training: 2022-04-11 05:46:49,692-[agedb_30][184000]XNorm: 7.786164 Training: 2022-04-11 05:46:49,692-[agedb_30][184000]Accuracy-Flip: 0.96933+-0.00688 Training: 2022-04-11 05:46:49,693-[agedb_30][184000]Accuracy-Highest: 0.97067 Training: 2022-04-11 05:46:50,652-Speed 150.45 samples/sec Loss 4.4721 LearningRate 0.0036 Epoch: 32 Global Step: 184010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:46:51,549-Speed 11420.22 samples/sec Loss 4.3899 LearningRate 0.0036 Epoch: 32 Global Step: 184020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:52,457-Speed 11283.09 samples/sec Loss 4.5273 LearningRate 0.0036 Epoch: 32 Global Step: 184030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:53,386-Speed 11034.07 samples/sec Loss 4.4465 LearningRate 0.0036 Epoch: 32 Global Step: 184040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:54,300-Speed 11211.53 samples/sec Loss 4.3977 LearningRate 0.0036 Epoch: 32 Global Step: 184050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:55,271-Speed 10552.00 samples/sec Loss 4.5058 LearningRate 0.0036 Epoch: 32 Global Step: 184060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:56,200-Speed 11030.63 samples/sec Loss 4.5567 LearningRate 0.0036 Epoch: 32 Global Step: 184070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:57,112-Speed 11236.70 samples/sec Loss 4.3216 LearningRate 0.0036 Epoch: 32 Global Step: 184080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:58,052-Speed 10896.36 samples/sec Loss 4.4458 LearningRate 0.0036 Epoch: 32 Global Step: 184090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:59,009-Speed 10713.63 samples/sec Loss 4.5803 LearningRate 0.0036 Epoch: 32 Global Step: 184100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:46:59,920-Speed 11240.98 samples/sec Loss 4.5006 LearningRate 0.0036 Epoch: 32 Global Step: 184110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:00,869-Speed 10800.12 samples/sec Loss 4.4366 LearningRate 0.0036 Epoch: 32 Global Step: 184120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:01,837-Speed 10586.07 samples/sec Loss 4.6114 LearningRate 0.0036 Epoch: 32 Global Step: 184130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:02,782-Speed 10845.43 samples/sec Loss 4.4968 LearningRate 0.0036 Epoch: 32 Global Step: 184140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:03,754-Speed 10543.69 samples/sec Loss 4.4826 LearningRate 0.0036 Epoch: 32 Global Step: 184150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:04,698-Speed 10854.95 samples/sec Loss 4.5451 LearningRate 0.0036 Epoch: 32 Global Step: 184160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:05,656-Speed 10688.41 samples/sec Loss 4.5129 LearningRate 0.0036 Epoch: 32 Global Step: 184170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:06,617-Speed 10667.44 samples/sec Loss 4.5086 LearningRate 0.0036 Epoch: 32 Global Step: 184180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:07,551-Speed 10964.77 samples/sec Loss 4.5391 LearningRate 0.0036 Epoch: 32 Global Step: 184190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:08,536-Speed 10406.08 samples/sec Loss 4.3208 LearningRate 0.0036 Epoch: 32 Global Step: 184200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:09,439-Speed 11349.41 samples/sec Loss 4.4190 LearningRate 0.0036 Epoch: 32 Global Step: 184210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:10,339-Speed 11396.03 samples/sec Loss 4.5109 LearningRate 0.0036 Epoch: 32 Global Step: 184220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:11,237-Speed 11404.27 samples/sec Loss 4.4273 LearningRate 0.0036 Epoch: 32 Global Step: 184230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:12,124-Speed 11551.97 samples/sec Loss 4.5237 LearningRate 0.0036 Epoch: 32 Global Step: 184240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:13,030-Speed 11316.36 samples/sec Loss 4.5659 LearningRate 0.0036 Epoch: 32 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:13,990-Speed 10669.51 samples/sec Loss 4.4532 LearningRate 0.0036 Epoch: 32 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:14,924-Speed 10974.00 samples/sec Loss 4.5726 LearningRate 0.0036 Epoch: 32 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:15,845-Speed 11126.09 samples/sec Loss 4.5223 LearningRate 0.0036 Epoch: 32 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:16,815-Speed 10562.57 samples/sec Loss 4.6396 LearningRate 0.0036 Epoch: 32 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:17,740-Speed 11077.43 samples/sec Loss 4.5368 LearningRate 0.0036 Epoch: 32 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:18,665-Speed 11071.14 samples/sec Loss 4.4071 LearningRate 0.0036 Epoch: 32 Global Step: 184310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:19,623-Speed 10705.51 samples/sec Loss 4.4292 LearningRate 0.0036 Epoch: 32 Global Step: 184320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:20,568-Speed 10833.57 samples/sec Loss 4.4281 LearningRate 0.0036 Epoch: 32 Global Step: 184330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:21,465-Speed 11426.31 samples/sec Loss 4.6024 LearningRate 0.0036 Epoch: 32 Global Step: 184340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:22,362-Speed 11430.64 samples/sec Loss 4.6168 LearningRate 0.0036 Epoch: 32 Global Step: 184350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:23,271-Speed 11263.06 samples/sec Loss 4.4680 LearningRate 0.0036 Epoch: 32 Global Step: 184360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:24,192-Speed 11127.46 samples/sec Loss 4.5183 LearningRate 0.0036 Epoch: 32 Global Step: 184370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:25,085-Speed 11475.89 samples/sec Loss 4.4455 LearningRate 0.0036 Epoch: 32 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:26,059-Speed 10525.23 samples/sec Loss 4.3195 LearningRate 0.0036 Epoch: 32 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:27,022-Speed 10638.49 samples/sec Loss 4.4888 LearningRate 0.0036 Epoch: 32 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:27,974-Speed 10774.10 samples/sec Loss 4.4079 LearningRate 0.0036 Epoch: 32 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:28,931-Speed 10704.83 samples/sec Loss 4.3983 LearningRate 0.0036 Epoch: 32 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:29,945-Speed 10106.69 samples/sec Loss 4.5205 LearningRate 0.0036 Epoch: 32 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:30,929-Speed 10413.05 samples/sec Loss 4.4514 LearningRate 0.0036 Epoch: 32 Global Step: 184440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:31,872-Speed 10874.59 samples/sec Loss 4.4845 LearningRate 0.0036 Epoch: 32 Global Step: 184450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:32,796-Speed 11086.92 samples/sec Loss 4.5961 LearningRate 0.0036 Epoch: 32 Global Step: 184460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:33,694-Speed 11408.75 samples/sec Loss 4.5051 LearningRate 0.0036 Epoch: 32 Global Step: 184470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:34,596-Speed 11358.51 samples/sec Loss 4.5004 LearningRate 0.0036 Epoch: 32 Global Step: 184480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:35,490-Speed 11463.92 samples/sec Loss 4.4012 LearningRate 0.0036 Epoch: 32 Global Step: 184490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:36,408-Speed 11157.86 samples/sec Loss 4.4774 LearningRate 0.0036 Epoch: 32 Global Step: 184500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:37,370-Speed 10655.43 samples/sec Loss 4.5484 LearningRate 0.0036 Epoch: 32 Global Step: 184510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:38,285-Speed 11198.45 samples/sec Loss 4.4487 LearningRate 0.0036 Epoch: 32 Global Step: 184520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:39,192-Speed 11295.94 samples/sec Loss 4.4007 LearningRate 0.0036 Epoch: 32 Global Step: 184530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:40,095-Speed 11356.61 samples/sec Loss 4.5037 LearningRate 0.0036 Epoch: 32 Global Step: 184540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:41,012-Speed 11165.82 samples/sec Loss 4.5048 LearningRate 0.0036 Epoch: 32 Global Step: 184550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:41,965-Speed 10759.74 samples/sec Loss 4.4439 LearningRate 0.0036 Epoch: 32 Global Step: 184560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:42,932-Speed 10588.26 samples/sec Loss 4.5792 LearningRate 0.0036 Epoch: 32 Global Step: 184570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:43,867-Speed 10967.65 samples/sec Loss 4.4822 LearningRate 0.0036 Epoch: 32 Global Step: 184580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:44,782-Speed 11196.22 samples/sec Loss 4.3933 LearningRate 0.0035 Epoch: 32 Global Step: 184590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:45,695-Speed 11234.57 samples/sec Loss 4.4675 LearningRate 0.0035 Epoch: 32 Global Step: 184600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:46,620-Speed 11081.82 samples/sec Loss 4.5045 LearningRate 0.0035 Epoch: 32 Global Step: 184610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:47,560-Speed 10891.35 samples/sec Loss 4.5121 LearningRate 0.0035 Epoch: 32 Global Step: 184620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:48,502-Speed 10877.19 samples/sec Loss 4.4825 LearningRate 0.0035 Epoch: 32 Global Step: 184630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:49,411-Speed 11278.18 samples/sec Loss 4.4124 LearningRate 0.0035 Epoch: 32 Global Step: 184640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:50,358-Speed 10824.41 samples/sec Loss 4.4717 LearningRate 0.0035 Epoch: 32 Global Step: 184650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:51,286-Speed 11039.82 samples/sec Loss 4.5240 LearningRate 0.0035 Epoch: 32 Global Step: 184660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:52,220-Speed 10964.93 samples/sec Loss 4.5382 LearningRate 0.0035 Epoch: 32 Global Step: 184670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:47:53,164-Speed 10860.09 samples/sec Loss 4.4526 LearningRate 0.0035 Epoch: 32 Global Step: 184680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:54,138-Speed 10522.24 samples/sec Loss 4.4712 LearningRate 0.0035 Epoch: 32 Global Step: 184690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:55,067-Speed 11031.26 samples/sec Loss 4.4933 LearningRate 0.0035 Epoch: 32 Global Step: 184700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:56,057-Speed 10348.47 samples/sec Loss 4.4601 LearningRate 0.0035 Epoch: 32 Global Step: 184710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:57,023-Speed 10606.73 samples/sec Loss 4.4159 LearningRate 0.0035 Epoch: 32 Global Step: 184720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:57,932-Speed 11264.39 samples/sec Loss 4.4234 LearningRate 0.0035 Epoch: 32 Global Step: 184730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:58,866-Speed 10984.20 samples/sec Loss 4.5354 LearningRate 0.0035 Epoch: 32 Global Step: 184740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:47:59,787-Speed 11115.86 samples/sec Loss 4.4619 LearningRate 0.0035 Epoch: 32 Global Step: 184750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:00,680-Speed 11478.92 samples/sec Loss 4.5328 LearningRate 0.0035 Epoch: 32 Global Step: 184760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:01,633-Speed 10756.71 samples/sec Loss 4.5002 LearningRate 0.0035 Epoch: 32 Global Step: 184770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:02,536-Speed 11346.20 samples/sec Loss 4.4202 LearningRate 0.0035 Epoch: 32 Global Step: 184780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:03,469-Speed 10977.39 samples/sec Loss 4.5339 LearningRate 0.0035 Epoch: 32 Global Step: 184790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:04,399-Speed 11018.75 samples/sec Loss 4.4003 LearningRate 0.0035 Epoch: 32 Global Step: 184800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:05,305-Speed 11309.51 samples/sec Loss 4.5228 LearningRate 0.0035 Epoch: 32 Global Step: 184810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:06,253-Speed 10812.84 samples/sec Loss 4.5667 LearningRate 0.0035 Epoch: 32 Global Step: 184820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:07,207-Speed 10749.57 samples/sec Loss 4.4990 LearningRate 0.0035 Epoch: 32 Global Step: 184830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:08,105-Speed 11415.86 samples/sec Loss 4.5770 LearningRate 0.0035 Epoch: 32 Global Step: 184840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:09,021-Speed 11188.86 samples/sec Loss 4.5118 LearningRate 0.0035 Epoch: 32 Global Step: 184850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:09,959-Speed 10914.15 samples/sec Loss 4.5636 LearningRate 0.0035 Epoch: 32 Global Step: 184860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:10,880-Speed 11131.17 samples/sec Loss 4.5076 LearningRate 0.0035 Epoch: 32 Global Step: 184870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:11,856-Speed 10501.30 samples/sec Loss 4.5597 LearningRate 0.0035 Epoch: 32 Global Step: 184880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:12,777-Speed 11119.03 samples/sec Loss 4.3229 LearningRate 0.0035 Epoch: 32 Global Step: 184890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:13,686-Speed 11278.49 samples/sec Loss 4.5410 LearningRate 0.0035 Epoch: 32 Global Step: 184900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:14,596-Speed 11260.88 samples/sec Loss 4.6251 LearningRate 0.0035 Epoch: 32 Global Step: 184910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:15,492-Speed 11441.02 samples/sec Loss 4.5416 LearningRate 0.0035 Epoch: 32 Global Step: 184920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:16,405-Speed 11217.48 samples/sec Loss 4.4942 LearningRate 0.0035 Epoch: 32 Global Step: 184930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:17,309-Speed 11333.66 samples/sec Loss 4.4452 LearningRate 0.0035 Epoch: 32 Global Step: 184940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:18,198-Speed 11529.58 samples/sec Loss 4.4743 LearningRate 0.0035 Epoch: 32 Global Step: 184950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:19,128-Speed 11022.17 samples/sec Loss 4.4393 LearningRate 0.0035 Epoch: 32 Global Step: 184960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:20,037-Speed 11271.85 samples/sec Loss 4.4526 LearningRate 0.0035 Epoch: 32 Global Step: 184970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:20,948-Speed 11246.86 samples/sec Loss 4.5064 LearningRate 0.0035 Epoch: 32 Global Step: 184980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:21,886-Speed 10922.57 samples/sec Loss 4.4641 LearningRate 0.0035 Epoch: 32 Global Step: 184990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:22,817-Speed 11001.92 samples/sec Loss 4.5116 LearningRate 0.0035 Epoch: 32 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:23,766-Speed 10803.97 samples/sec Loss 4.4354 LearningRate 0.0035 Epoch: 32 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:24,664-Speed 11411.95 samples/sec Loss 4.4427 LearningRate 0.0035 Epoch: 32 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:25,617-Speed 10747.51 samples/sec Loss 4.5651 LearningRate 0.0035 Epoch: 32 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:26,546-Speed 11041.58 samples/sec Loss 4.5435 LearningRate 0.0035 Epoch: 32 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 05:48:27,530-Speed 10406.17 samples/sec Loss 4.5173 LearningRate 0.0035 Epoch: 32 Global Step: 185050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:28,488-Speed 10697.67 samples/sec Loss 4.4040 LearningRate 0.0035 Epoch: 32 Global Step: 185060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:29,470-Speed 10449.92 samples/sec Loss 4.4360 LearningRate 0.0035 Epoch: 32 Global Step: 185070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 05:48:30,479-Speed 10149.11 samples/sec Loss 4.5808 LearningRate 0.0035 Epoch: 32 Global Step: 185080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:31,426-Speed 10823.20 samples/sec Loss 4.6095 LearningRate 0.0035 Epoch: 32 Global Step: 185090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:32,353-Speed 11049.41 samples/sec Loss 4.3935 LearningRate 0.0035 Epoch: 32 Global Step: 185100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:33,286-Speed 10984.72 samples/sec Loss 4.5015 LearningRate 0.0035 Epoch: 32 Global Step: 185110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:34,183-Speed 11426.72 samples/sec Loss 4.3990 LearningRate 0.0035 Epoch: 32 Global Step: 185120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:35,091-Speed 11276.70 samples/sec Loss 4.4581 LearningRate 0.0035 Epoch: 32 Global Step: 185130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:36,027-Speed 10951.18 samples/sec Loss 4.5279 LearningRate 0.0035 Epoch: 32 Global Step: 185140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:36,984-Speed 10711.56 samples/sec Loss 4.5077 LearningRate 0.0035 Epoch: 32 Global Step: 185150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:37,906-Speed 11108.79 samples/sec Loss 4.5797 LearningRate 0.0035 Epoch: 32 Global Step: 185160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:38,883-Speed 10490.83 samples/sec Loss 4.6418 LearningRate 0.0035 Epoch: 32 Global Step: 185170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:39,818-Speed 10954.89 samples/sec Loss 4.5235 LearningRate 0.0035 Epoch: 32 Global Step: 185180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:40,781-Speed 10645.67 samples/sec Loss 4.4051 LearningRate 0.0035 Epoch: 32 Global Step: 185190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:41,718-Speed 10935.62 samples/sec Loss 4.5794 LearningRate 0.0034 Epoch: 32 Global Step: 185200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:42,670-Speed 10761.16 samples/sec Loss 4.5342 LearningRate 0.0034 Epoch: 32 Global Step: 185210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:43,606-Speed 10940.61 samples/sec Loss 4.5244 LearningRate 0.0034 Epoch: 32 Global Step: 185220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:44,526-Speed 11147.75 samples/sec Loss 4.5094 LearningRate 0.0034 Epoch: 32 Global Step: 185230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:45,425-Speed 11397.37 samples/sec Loss 4.4879 LearningRate 0.0034 Epoch: 32 Global Step: 185240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:46,367-Speed 10870.57 samples/sec Loss 4.5821 LearningRate 0.0034 Epoch: 32 Global Step: 185250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:47,329-Speed 10653.47 samples/sec Loss 4.5153 LearningRate 0.0034 Epoch: 32 Global Step: 185260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:48,250-Speed 11130.66 samples/sec Loss 4.3728 LearningRate 0.0034 Epoch: 32 Global Step: 185270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:49,149-Speed 11399.01 samples/sec Loss 4.6607 LearningRate 0.0034 Epoch: 32 Global Step: 185280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:50,043-Speed 11461.83 samples/sec Loss 4.4734 LearningRate 0.0034 Epoch: 32 Global Step: 185290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:50,966-Speed 11108.72 samples/sec Loss 4.4953 LearningRate 0.0034 Epoch: 32 Global Step: 185300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:51,889-Speed 11100.78 samples/sec Loss 4.4571 LearningRate 0.0034 Epoch: 32 Global Step: 185310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:52,801-Speed 11238.05 samples/sec Loss 4.5955 LearningRate 0.0034 Epoch: 32 Global Step: 185320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:53,725-Speed 11085.69 samples/sec Loss 4.4615 LearningRate 0.0034 Epoch: 32 Global Step: 185330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:48:54,677-Speed 10761.45 samples/sec Loss 4.5001 LearningRate 0.0034 Epoch: 32 Global Step: 185340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:55,615-Speed 10929.87 samples/sec Loss 4.6602 LearningRate 0.0034 Epoch: 32 Global Step: 185350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:56,558-Speed 10864.02 samples/sec Loss 4.5204 LearningRate 0.0034 Epoch: 32 Global Step: 185360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:57,493-Speed 10953.23 samples/sec Loss 4.4607 LearningRate 0.0034 Epoch: 32 Global Step: 185370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:58,470-Speed 10489.17 samples/sec Loss 4.5923 LearningRate 0.0034 Epoch: 32 Global Step: 185380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:48:59,439-Speed 10582.80 samples/sec Loss 4.4949 LearningRate 0.0034 Epoch: 32 Global Step: 185390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:00,384-Speed 10840.15 samples/sec Loss 4.4823 LearningRate 0.0034 Epoch: 32 Global Step: 185400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:01,311-Speed 11054.21 samples/sec Loss 4.6716 LearningRate 0.0034 Epoch: 32 Global Step: 185410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:02,276-Speed 10621.10 samples/sec Loss 4.4814 LearningRate 0.0034 Epoch: 32 Global Step: 185420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:03,229-Speed 10751.25 samples/sec Loss 4.4458 LearningRate 0.0034 Epoch: 32 Global Step: 185430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:04,189-Speed 10675.17 samples/sec Loss 4.5840 LearningRate 0.0034 Epoch: 32 Global Step: 185440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:05,138-Speed 10792.29 samples/sec Loss 4.7050 LearningRate 0.0034 Epoch: 32 Global Step: 185450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:06,069-Speed 11010.03 samples/sec Loss 4.5676 LearningRate 0.0034 Epoch: 32 Global Step: 185460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:07,013-Speed 10849.61 samples/sec Loss 4.4306 LearningRate 0.0034 Epoch: 32 Global Step: 185470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:07,967-Speed 10737.90 samples/sec Loss 4.5040 LearningRate 0.0034 Epoch: 32 Global Step: 185480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:08,934-Speed 10597.78 samples/sec Loss 4.6112 LearningRate 0.0034 Epoch: 32 Global Step: 185490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:09,864-Speed 11019.81 samples/sec Loss 4.5317 LearningRate 0.0034 Epoch: 32 Global Step: 185500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:10,784-Speed 11145.96 samples/sec Loss 4.6695 LearningRate 0.0034 Epoch: 32 Global Step: 185510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:11,733-Speed 10794.41 samples/sec Loss 4.6565 LearningRate 0.0034 Epoch: 32 Global Step: 185520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:12,629-Speed 11437.10 samples/sec Loss 4.6127 LearningRate 0.0034 Epoch: 32 Global Step: 185530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:13,547-Speed 11164.10 samples/sec Loss 4.5124 LearningRate 0.0034 Epoch: 32 Global Step: 185540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:14,450-Speed 11342.25 samples/sec Loss 4.4281 LearningRate 0.0034 Epoch: 32 Global Step: 185550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:15,337-Speed 11554.50 samples/sec Loss 4.5205 LearningRate 0.0034 Epoch: 32 Global Step: 185560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:16,281-Speed 10850.59 samples/sec Loss 4.5947 LearningRate 0.0034 Epoch: 32 Global Step: 185570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:17,208-Speed 11053.05 samples/sec Loss 4.8003 LearningRate 0.0034 Epoch: 32 Global Step: 185580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:18,115-Speed 11301.73 samples/sec Loss 4.3861 LearningRate 0.0034 Epoch: 32 Global Step: 185590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:19,022-Speed 11303.19 samples/sec Loss 4.6146 LearningRate 0.0034 Epoch: 32 Global Step: 185600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:19,943-Speed 11121.98 samples/sec Loss 4.5753 LearningRate 0.0034 Epoch: 32 Global Step: 185610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:20,861-Speed 11159.40 samples/sec Loss 4.5972 LearningRate 0.0034 Epoch: 32 Global Step: 185620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:21,788-Speed 11052.70 samples/sec Loss 4.4463 LearningRate 0.0034 Epoch: 32 Global Step: 185630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:22,716-Speed 11050.58 samples/sec Loss 4.6394 LearningRate 0.0034 Epoch: 32 Global Step: 185640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:23,672-Speed 10713.03 samples/sec Loss 4.6448 LearningRate 0.0034 Epoch: 32 Global Step: 185650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:24,588-Speed 11183.33 samples/sec Loss 4.6134 LearningRate 0.0034 Epoch: 32 Global Step: 185660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:25,521-Speed 10984.30 samples/sec Loss 4.6179 LearningRate 0.0034 Epoch: 32 Global Step: 185670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:26,483-Speed 10653.84 samples/sec Loss 4.5150 LearningRate 0.0034 Epoch: 32 Global Step: 185680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:27,369-Speed 11560.41 samples/sec Loss 4.3849 LearningRate 0.0034 Epoch: 32 Global Step: 185690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:28,300-Speed 11013.35 samples/sec Loss 4.6113 LearningRate 0.0034 Epoch: 32 Global Step: 185700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:29,236-Speed 10951.53 samples/sec Loss 4.5453 LearningRate 0.0034 Epoch: 32 Global Step: 185710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:30,235-Speed 10247.49 samples/sec Loss 4.4893 LearningRate 0.0034 Epoch: 32 Global Step: 185720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:31,201-Speed 10618.70 samples/sec Loss 4.5622 LearningRate 0.0034 Epoch: 32 Global Step: 185730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:32,203-Speed 10229.04 samples/sec Loss 4.5385 LearningRate 0.0034 Epoch: 32 Global Step: 185740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:33,169-Speed 10604.81 samples/sec Loss 4.5882 LearningRate 0.0034 Epoch: 32 Global Step: 185750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:34,102-Speed 10993.59 samples/sec Loss 4.6343 LearningRate 0.0034 Epoch: 32 Global Step: 185760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:34,995-Speed 11473.10 samples/sec Loss 4.5320 LearningRate 0.0034 Epoch: 32 Global Step: 185770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:35,914-Speed 11151.45 samples/sec Loss 4.4488 LearningRate 0.0034 Epoch: 32 Global Step: 185780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:36,866-Speed 10758.19 samples/sec Loss 4.4688 LearningRate 0.0034 Epoch: 32 Global Step: 185790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:37,801-Speed 10954.43 samples/sec Loss 4.4368 LearningRate 0.0034 Epoch: 32 Global Step: 185800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:38,728-Speed 11056.39 samples/sec Loss 4.5127 LearningRate 0.0034 Epoch: 32 Global Step: 185810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:39,692-Speed 10629.03 samples/sec Loss 4.5079 LearningRate 0.0033 Epoch: 32 Global Step: 185820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:40,605-Speed 11226.30 samples/sec Loss 4.5051 LearningRate 0.0033 Epoch: 32 Global Step: 185830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:41,534-Speed 11036.10 samples/sec Loss 4.4711 LearningRate 0.0033 Epoch: 32 Global Step: 185840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:42,501-Speed 10589.93 samples/sec Loss 4.4102 LearningRate 0.0033 Epoch: 32 Global Step: 185850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:43,450-Speed 10794.28 samples/sec Loss 4.4088 LearningRate 0.0033 Epoch: 32 Global Step: 185860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:44,411-Speed 10666.66 samples/sec Loss 4.7908 LearningRate 0.0033 Epoch: 32 Global Step: 185870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:45,319-Speed 11290.52 samples/sec Loss 4.3770 LearningRate 0.0033 Epoch: 32 Global Step: 185880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:46,282-Speed 10640.70 samples/sec Loss 4.5905 LearningRate 0.0033 Epoch: 32 Global Step: 185890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:47,214-Speed 10984.92 samples/sec Loss 4.4466 LearningRate 0.0033 Epoch: 32 Global Step: 185900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:48,149-Speed 10963.45 samples/sec Loss 4.6133 LearningRate 0.0033 Epoch: 32 Global Step: 185910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:49,094-Speed 10846.30 samples/sec Loss 4.5287 LearningRate 0.0033 Epoch: 32 Global Step: 185920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:49:49,987-Speed 11470.08 samples/sec Loss 4.3835 LearningRate 0.0033 Epoch: 32 Global Step: 185930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:50,942-Speed 10727.61 samples/sec Loss 4.5772 LearningRate 0.0033 Epoch: 32 Global Step: 185940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:51,884-Speed 10881.03 samples/sec Loss 4.6934 LearningRate 0.0033 Epoch: 32 Global Step: 185950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:52,814-Speed 11024.76 samples/sec Loss 4.4800 LearningRate 0.0033 Epoch: 32 Global Step: 185960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:53,790-Speed 10492.69 samples/sec Loss 4.6366 LearningRate 0.0033 Epoch: 32 Global Step: 185970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:54,722-Speed 11006.27 samples/sec Loss 4.6614 LearningRate 0.0033 Epoch: 32 Global Step: 185980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:55,631-Speed 11274.19 samples/sec Loss 4.5266 LearningRate 0.0033 Epoch: 32 Global Step: 185990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:49:56,543-Speed 11236.20 samples/sec Loss 4.6216 LearningRate 0.0033 Epoch: 32 Global Step: 186000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:50:17,987-[lfw][186000]XNorm: 7.964523 Training: 2022-04-11 05:50:17,987-[lfw][186000]Accuracy-Flip: 0.99700+-0.00323 Training: 2022-04-11 05:50:17,987-[lfw][186000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:50:42,817-[cfp_fp][186000]XNorm: 6.796163 Training: 2022-04-11 05:50:42,817-[cfp_fp][186000]Accuracy-Flip: 0.95386+-0.01014 Training: 2022-04-11 05:50:42,818-[cfp_fp][186000]Accuracy-Highest: 0.95500 Training: 2022-04-11 05:51:04,059-[agedb_30][186000]XNorm: 7.799655 Training: 2022-04-11 05:51:04,060-[agedb_30][186000]Accuracy-Flip: 0.96833+-0.00645 Training: 2022-04-11 05:51:04,060-[agedb_30][186000]Accuracy-Highest: 0.97067 Training: 2022-04-11 05:51:05,005-Speed 149.57 samples/sec Loss 4.6698 LearningRate 0.0033 Epoch: 32 Global Step: 186010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:05,932-Speed 11063.24 samples/sec Loss 4.4544 LearningRate 0.0033 Epoch: 32 Global Step: 186020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:06,823-Speed 11490.31 samples/sec Loss 4.5159 LearningRate 0.0033 Epoch: 32 Global Step: 186030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:07,799-Speed 10503.63 samples/sec Loss 4.4913 LearningRate 0.0033 Epoch: 32 Global Step: 186040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:08,765-Speed 10605.58 samples/sec Loss 4.5406 LearningRate 0.0033 Epoch: 32 Global Step: 186050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:09,674-Speed 11268.81 samples/sec Loss 4.5381 LearningRate 0.0033 Epoch: 32 Global Step: 186060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:10,636-Speed 10649.14 samples/sec Loss 4.6344 LearningRate 0.0033 Epoch: 32 Global Step: 186070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:11,604-Speed 10592.80 samples/sec Loss 4.5762 LearningRate 0.0033 Epoch: 32 Global Step: 186080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:12,560-Speed 10718.43 samples/sec Loss 4.6890 LearningRate 0.0033 Epoch: 32 Global Step: 186090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:13,474-Speed 11213.02 samples/sec Loss 4.4967 LearningRate 0.0033 Epoch: 32 Global Step: 186100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:14,375-Speed 11372.90 samples/sec Loss 4.5630 LearningRate 0.0033 Epoch: 32 Global Step: 186110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:15,284-Speed 11270.88 samples/sec Loss 4.4276 LearningRate 0.0033 Epoch: 32 Global Step: 186120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:16,193-Speed 11266.96 samples/sec Loss 4.4883 LearningRate 0.0033 Epoch: 32 Global Step: 186130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:17,134-Speed 10896.66 samples/sec Loss 4.5337 LearningRate 0.0033 Epoch: 32 Global Step: 186140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:18,089-Speed 10722.71 samples/sec Loss 4.5233 LearningRate 0.0033 Epoch: 32 Global Step: 186150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:19,016-Speed 11059.46 samples/sec Loss 4.5762 LearningRate 0.0033 Epoch: 32 Global Step: 186160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:19,939-Speed 11101.07 samples/sec Loss 4.5135 LearningRate 0.0033 Epoch: 32 Global Step: 186170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:20,915-Speed 10493.80 samples/sec Loss 4.6348 LearningRate 0.0033 Epoch: 32 Global Step: 186180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:21,841-Speed 11071.12 samples/sec Loss 4.5351 LearningRate 0.0033 Epoch: 32 Global Step: 186190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:22,761-Speed 11138.26 samples/sec Loss 4.5941 LearningRate 0.0033 Epoch: 32 Global Step: 186200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:23,693-Speed 10991.35 samples/sec Loss 4.6005 LearningRate 0.0033 Epoch: 32 Global Step: 186210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:24,630-Speed 10941.51 samples/sec Loss 4.6568 LearningRate 0.0033 Epoch: 32 Global Step: 186220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:51:25,539-Speed 11274.39 samples/sec Loss 4.4261 LearningRate 0.0033 Epoch: 32 Global Step: 186230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:26,440-Speed 11360.87 samples/sec Loss 4.7096 LearningRate 0.0033 Epoch: 32 Global Step: 186240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:27,400-Speed 10676.80 samples/sec Loss 4.4176 LearningRate 0.0033 Epoch: 32 Global Step: 186250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:28,322-Speed 11112.89 samples/sec Loss 4.6087 LearningRate 0.0033 Epoch: 32 Global Step: 186260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:29,278-Speed 10724.35 samples/sec Loss 4.6394 LearningRate 0.0033 Epoch: 32 Global Step: 186270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:30,210-Speed 10989.97 samples/sec Loss 4.5711 LearningRate 0.0033 Epoch: 32 Global Step: 186280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:31,184-Speed 10516.77 samples/sec Loss 4.5574 LearningRate 0.0033 Epoch: 32 Global Step: 186290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:32,162-Speed 10486.92 samples/sec Loss 4.5188 LearningRate 0.0033 Epoch: 32 Global Step: 186300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:33,124-Speed 10646.01 samples/sec Loss 4.5649 LearningRate 0.0033 Epoch: 32 Global Step: 186310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:34,061-Speed 10944.32 samples/sec Loss 4.4748 LearningRate 0.0033 Epoch: 32 Global Step: 186320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:34,991-Speed 11011.77 samples/sec Loss 4.5825 LearningRate 0.0033 Epoch: 32 Global Step: 186330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:35,959-Speed 10584.60 samples/sec Loss 4.4638 LearningRate 0.0033 Epoch: 32 Global Step: 186340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:36,944-Speed 10406.95 samples/sec Loss 4.4834 LearningRate 0.0033 Epoch: 32 Global Step: 186350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:37,890-Speed 10829.89 samples/sec Loss 4.6241 LearningRate 0.0033 Epoch: 32 Global Step: 186360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:38,873-Speed 10420.24 samples/sec Loss 4.5082 LearningRate 0.0033 Epoch: 32 Global Step: 186370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:39,826-Speed 10751.32 samples/sec Loss 4.6319 LearningRate 0.0033 Epoch: 32 Global Step: 186380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:40,727-Speed 11372.61 samples/sec Loss 4.6289 LearningRate 0.0033 Epoch: 32 Global Step: 186390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:41,655-Speed 11047.80 samples/sec Loss 4.5329 LearningRate 0.0033 Epoch: 32 Global Step: 186400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:42,563-Speed 11286.98 samples/sec Loss 4.6562 LearningRate 0.0033 Epoch: 32 Global Step: 186410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:43,524-Speed 10664.41 samples/sec Loss 4.5796 LearningRate 0.0033 Epoch: 32 Global Step: 186420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:44,484-Speed 10676.15 samples/sec Loss 4.6218 LearningRate 0.0033 Epoch: 32 Global Step: 186430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:45,380-Speed 11432.53 samples/sec Loss 4.5954 LearningRate 0.0032 Epoch: 32 Global Step: 186440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:46,322-Speed 10877.38 samples/sec Loss 4.5240 LearningRate 0.0032 Epoch: 32 Global Step: 186450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:47,266-Speed 10849.80 samples/sec Loss 4.6505 LearningRate 0.0032 Epoch: 32 Global Step: 186460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:48,193-Speed 11060.86 samples/sec Loss 4.5020 LearningRate 0.0032 Epoch: 32 Global Step: 186470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:49,118-Speed 11073.32 samples/sec Loss 4.6203 LearningRate 0.0032 Epoch: 32 Global Step: 186480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:50,077-Speed 10683.03 samples/sec Loss 4.5330 LearningRate 0.0032 Epoch: 32 Global Step: 186490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:51,003-Speed 11069.00 samples/sec Loss 4.5732 LearningRate 0.0032 Epoch: 32 Global Step: 186500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:51,923-Speed 11139.97 samples/sec Loss 4.6549 LearningRate 0.0032 Epoch: 32 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:52,836-Speed 11216.71 samples/sec Loss 4.5094 LearningRate 0.0032 Epoch: 32 Global Step: 186520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:51:53,761-Speed 11091.15 samples/sec Loss 4.5052 LearningRate 0.0032 Epoch: 32 Global Step: 186530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:54,707-Speed 10831.19 samples/sec Loss 4.5515 LearningRate 0.0032 Epoch: 32 Global Step: 186540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:55,639-Speed 10984.07 samples/sec Loss 4.6022 LearningRate 0.0032 Epoch: 32 Global Step: 186550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:56,566-Speed 11059.18 samples/sec Loss 4.6083 LearningRate 0.0032 Epoch: 32 Global Step: 186560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:57,480-Speed 11213.86 samples/sec Loss 4.4947 LearningRate 0.0032 Epoch: 32 Global Step: 186570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:58,412-Speed 10995.82 samples/sec Loss 4.5566 LearningRate 0.0032 Epoch: 32 Global Step: 186580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:51:59,351-Speed 10909.19 samples/sec Loss 4.5702 LearningRate 0.0032 Epoch: 32 Global Step: 186590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:00,271-Speed 11139.61 samples/sec Loss 4.6923 LearningRate 0.0032 Epoch: 32 Global Step: 186600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:01,169-Speed 11410.89 samples/sec Loss 4.6466 LearningRate 0.0032 Epoch: 32 Global Step: 186610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:02,090-Speed 11123.92 samples/sec Loss 4.4793 LearningRate 0.0032 Epoch: 32 Global Step: 186620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:03,045-Speed 10726.79 samples/sec Loss 4.6902 LearningRate 0.0032 Epoch: 32 Global Step: 186630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:03,970-Speed 11077.40 samples/sec Loss 4.6869 LearningRate 0.0032 Epoch: 32 Global Step: 186640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:04,879-Speed 11282.92 samples/sec Loss 4.5236 LearningRate 0.0032 Epoch: 32 Global Step: 186650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:05,790-Speed 11240.44 samples/sec Loss 4.6076 LearningRate 0.0032 Epoch: 32 Global Step: 186660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:06,706-Speed 11186.96 samples/sec Loss 4.4364 LearningRate 0.0032 Epoch: 32 Global Step: 186670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:07,654-Speed 10807.92 samples/sec Loss 4.5707 LearningRate 0.0032 Epoch: 32 Global Step: 186680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:08,609-Speed 10736.66 samples/sec Loss 4.5976 LearningRate 0.0032 Epoch: 32 Global Step: 186690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:09,576-Speed 10587.45 samples/sec Loss 4.6260 LearningRate 0.0032 Epoch: 32 Global Step: 186700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:10,496-Speed 11136.93 samples/sec Loss 4.6697 LearningRate 0.0032 Epoch: 32 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:11,410-Speed 11216.79 samples/sec Loss 4.5816 LearningRate 0.0032 Epoch: 32 Global Step: 186720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:12,347-Speed 10935.74 samples/sec Loss 4.5182 LearningRate 0.0032 Epoch: 32 Global Step: 186730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:13,321-Speed 10522.13 samples/sec Loss 4.5543 LearningRate 0.0032 Epoch: 32 Global Step: 186740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:14,296-Speed 10509.23 samples/sec Loss 4.5505 LearningRate 0.0032 Epoch: 32 Global Step: 186750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:15,241-Speed 10842.40 samples/sec Loss 4.5437 LearningRate 0.0032 Epoch: 32 Global Step: 186760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:16,172-Speed 11013.85 samples/sec Loss 4.6963 LearningRate 0.0032 Epoch: 32 Global Step: 186770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:17,095-Speed 11101.13 samples/sec Loss 4.5281 LearningRate 0.0032 Epoch: 32 Global Step: 186780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:18,032-Speed 10938.15 samples/sec Loss 4.5473 LearningRate 0.0032 Epoch: 32 Global Step: 186790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:18,960-Speed 11037.44 samples/sec Loss 4.6503 LearningRate 0.0032 Epoch: 32 Global Step: 186800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:19,876-Speed 11191.00 samples/sec Loss 4.4366 LearningRate 0.0032 Epoch: 32 Global Step: 186810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:20,807-Speed 11003.56 samples/sec Loss 4.6117 LearningRate 0.0032 Epoch: 32 Global Step: 186820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:21,723-Speed 11188.19 samples/sec Loss 4.6059 LearningRate 0.0032 Epoch: 32 Global Step: 186830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:22,636-Speed 11222.73 samples/sec Loss 4.6991 LearningRate 0.0032 Epoch: 32 Global Step: 186840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:23,530-Speed 11457.19 samples/sec Loss 4.4357 LearningRate 0.0032 Epoch: 32 Global Step: 186850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:24,440-Speed 11260.96 samples/sec Loss 4.5725 LearningRate 0.0032 Epoch: 32 Global Step: 186860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:25,367-Speed 11055.97 samples/sec Loss 4.5660 LearningRate 0.0032 Epoch: 32 Global Step: 186870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:26,316-Speed 10796.60 samples/sec Loss 4.5189 LearningRate 0.0032 Epoch: 32 Global Step: 186880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:27,257-Speed 10892.17 samples/sec Loss 4.5183 LearningRate 0.0032 Epoch: 32 Global Step: 186890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:28,164-Speed 11296.70 samples/sec Loss 4.4723 LearningRate 0.0032 Epoch: 32 Global Step: 186900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:29,075-Speed 11242.85 samples/sec Loss 4.7160 LearningRate 0.0032 Epoch: 32 Global Step: 186910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:29,999-Speed 11096.64 samples/sec Loss 4.5419 LearningRate 0.0032 Epoch: 32 Global Step: 186920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:30,930-Speed 11006.02 samples/sec Loss 4.4621 LearningRate 0.0032 Epoch: 32 Global Step: 186930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:31,855-Speed 11074.62 samples/sec Loss 4.7090 LearningRate 0.0032 Epoch: 32 Global Step: 186940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:32,770-Speed 11206.90 samples/sec Loss 4.5491 LearningRate 0.0032 Epoch: 32 Global Step: 186950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:33,678-Speed 11283.68 samples/sec Loss 4.5036 LearningRate 0.0032 Epoch: 32 Global Step: 186960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:34,611-Speed 10984.64 samples/sec Loss 4.6550 LearningRate 0.0032 Epoch: 32 Global Step: 186970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:35,541-Speed 11011.45 samples/sec Loss 4.5297 LearningRate 0.0032 Epoch: 32 Global Step: 186980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:36,467-Speed 11070.09 samples/sec Loss 4.6792 LearningRate 0.0032 Epoch: 32 Global Step: 186990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:37,437-Speed 10570.19 samples/sec Loss 4.5269 LearningRate 0.0032 Epoch: 32 Global Step: 187000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:38,360-Speed 11106.33 samples/sec Loss 4.4589 LearningRate 0.0032 Epoch: 32 Global Step: 187010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:39,300-Speed 10901.80 samples/sec Loss 4.6472 LearningRate 0.0032 Epoch: 32 Global Step: 187020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:40,223-Speed 11100.03 samples/sec Loss 4.6259 LearningRate 0.0032 Epoch: 32 Global Step: 187030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:41,136-Speed 11220.30 samples/sec Loss 4.5349 LearningRate 0.0032 Epoch: 32 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:42,053-Speed 11171.09 samples/sec Loss 4.6708 LearningRate 0.0032 Epoch: 32 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:42,987-Speed 10974.92 samples/sec Loss 4.6696 LearningRate 0.0032 Epoch: 32 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:43,903-Speed 11187.40 samples/sec Loss 4.5164 LearningRate 0.0032 Epoch: 32 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:44,808-Speed 11325.90 samples/sec Loss 4.5019 LearningRate 0.0031 Epoch: 32 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:45,760-Speed 10763.37 samples/sec Loss 4.6792 LearningRate 0.0031 Epoch: 32 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:46,702-Speed 10872.07 samples/sec Loss 4.5732 LearningRate 0.0031 Epoch: 32 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:47,599-Speed 11426.10 samples/sec Loss 4.7005 LearningRate 0.0031 Epoch: 32 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:48,517-Speed 11166.94 samples/sec Loss 4.6593 LearningRate 0.0031 Epoch: 32 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:49,431-Speed 11213.56 samples/sec Loss 4.6627 LearningRate 0.0031 Epoch: 32 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:50,378-Speed 10812.22 samples/sec Loss 4.5069 LearningRate 0.0031 Epoch: 32 Global Step: 187140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:51,357-Speed 10469.93 samples/sec Loss 4.6260 LearningRate 0.0031 Epoch: 32 Global Step: 187150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:52:52,267-Speed 11258.85 samples/sec Loss 4.6434 LearningRate 0.0031 Epoch: 32 Global Step: 187160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:53,267-Speed 10251.12 samples/sec Loss 4.5688 LearningRate 0.0031 Epoch: 32 Global Step: 187170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:52:54,205-Speed 10923.73 samples/sec Loss 4.5538 LearningRate 0.0031 Epoch: 32 Global Step: 187180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:55,197-Speed 10327.34 samples/sec Loss 4.4502 LearningRate 0.0031 Epoch: 32 Global Step: 187190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:56,133-Speed 10943.92 samples/sec Loss 4.5750 LearningRate 0.0031 Epoch: 32 Global Step: 187200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:57,061-Speed 11039.72 samples/sec Loss 4.4542 LearningRate 0.0031 Epoch: 32 Global Step: 187210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:57,977-Speed 11196.64 samples/sec Loss 4.5642 LearningRate 0.0031 Epoch: 32 Global Step: 187220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:58,913-Speed 10944.91 samples/sec Loss 4.6172 LearningRate 0.0031 Epoch: 32 Global Step: 187230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:52:59,848-Speed 10965.93 samples/sec Loss 4.5298 LearningRate 0.0031 Epoch: 32 Global Step: 187240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:00,755-Speed 11289.00 samples/sec Loss 4.6161 LearningRate 0.0031 Epoch: 32 Global Step: 187250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:01,651-Speed 11440.56 samples/sec Loss 4.5454 LearningRate 0.0031 Epoch: 32 Global Step: 187260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:02,559-Speed 11290.72 samples/sec Loss 4.6079 LearningRate 0.0031 Epoch: 32 Global Step: 187270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:03,468-Speed 11276.45 samples/sec Loss 4.5273 LearningRate 0.0031 Epoch: 32 Global Step: 187280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:04,392-Speed 11088.44 samples/sec Loss 4.5503 LearningRate 0.0031 Epoch: 32 Global Step: 187290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:05,379-Speed 10384.20 samples/sec Loss 4.5889 LearningRate 0.0031 Epoch: 32 Global Step: 187300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:06,294-Speed 11194.76 samples/sec Loss 4.6259 LearningRate 0.0031 Epoch: 32 Global Step: 187310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:07,233-Speed 10906.81 samples/sec Loss 4.6660 LearningRate 0.0031 Epoch: 32 Global Step: 187320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:08,180-Speed 10819.75 samples/sec Loss 4.5946 LearningRate 0.0031 Epoch: 32 Global Step: 187330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:09,119-Speed 10921.51 samples/sec Loss 4.6577 LearningRate 0.0031 Epoch: 32 Global Step: 187340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:10,052-Speed 10977.61 samples/sec Loss 4.6083 LearningRate 0.0031 Epoch: 32 Global Step: 187350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:10,977-Speed 11078.69 samples/sec Loss 4.5287 LearningRate 0.0031 Epoch: 32 Global Step: 187360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:11,877-Speed 11385.20 samples/sec Loss 4.5476 LearningRate 0.0031 Epoch: 32 Global Step: 187370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:12,812-Speed 10960.73 samples/sec Loss 4.5325 LearningRate 0.0031 Epoch: 32 Global Step: 187380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:13,748-Speed 10954.16 samples/sec Loss 4.6005 LearningRate 0.0031 Epoch: 32 Global Step: 187390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:14,652-Speed 11327.44 samples/sec Loss 4.6398 LearningRate 0.0031 Epoch: 32 Global Step: 187400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:15,556-Speed 11334.35 samples/sec Loss 4.6162 LearningRate 0.0031 Epoch: 32 Global Step: 187410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:16,484-Speed 11047.43 samples/sec Loss 4.5456 LearningRate 0.0031 Epoch: 32 Global Step: 187420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:17,396-Speed 11227.25 samples/sec Loss 4.4898 LearningRate 0.0031 Epoch: 32 Global Step: 187430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:18,305-Speed 11275.59 samples/sec Loss 4.6789 LearningRate 0.0031 Epoch: 32 Global Step: 187440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:53:19,271-Speed 10611.37 samples/sec Loss 4.6655 LearningRate 0.0031 Epoch: 32 Global Step: 187450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:20,178-Speed 11294.81 samples/sec Loss 4.4558 LearningRate 0.0031 Epoch: 32 Global Step: 187460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:21,132-Speed 10746.68 samples/sec Loss 4.5419 LearningRate 0.0031 Epoch: 32 Global Step: 187470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:22,065-Speed 10978.79 samples/sec Loss 4.6296 LearningRate 0.0031 Epoch: 32 Global Step: 187480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:23,002-Speed 10941.74 samples/sec Loss 4.6038 LearningRate 0.0031 Epoch: 32 Global Step: 187490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:23,898-Speed 11436.91 samples/sec Loss 4.6428 LearningRate 0.0031 Epoch: 32 Global Step: 187500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:24,803-Speed 11326.45 samples/sec Loss 4.5442 LearningRate 0.0031 Epoch: 32 Global Step: 187510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:25,759-Speed 10711.22 samples/sec Loss 4.5348 LearningRate 0.0031 Epoch: 32 Global Step: 187520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:26,727-Speed 10582.52 samples/sec Loss 4.5340 LearningRate 0.0031 Epoch: 32 Global Step: 187530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:27,705-Speed 10480.14 samples/sec Loss 4.5113 LearningRate 0.0031 Epoch: 32 Global Step: 187540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:28,695-Speed 10352.31 samples/sec Loss 4.6238 LearningRate 0.0031 Epoch: 32 Global Step: 187550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:29,649-Speed 10745.66 samples/sec Loss 4.5513 LearningRate 0.0031 Epoch: 32 Global Step: 187560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:30,594-Speed 10839.45 samples/sec Loss 4.6631 LearningRate 0.0031 Epoch: 32 Global Step: 187570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:31,524-Speed 11012.65 samples/sec Loss 4.4596 LearningRate 0.0031 Epoch: 32 Global Step: 187580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:32,479-Speed 10738.58 samples/sec Loss 4.6093 LearningRate 0.0031 Epoch: 32 Global Step: 187590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:33,436-Speed 10706.63 samples/sec Loss 4.6060 LearningRate 0.0031 Epoch: 32 Global Step: 187600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:34,397-Speed 10658.00 samples/sec Loss 4.5372 LearningRate 0.0031 Epoch: 32 Global Step: 187610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:35,338-Speed 10886.63 samples/sec Loss 4.5858 LearningRate 0.0031 Epoch: 32 Global Step: 187620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:36,311-Speed 10534.02 samples/sec Loss 4.4876 LearningRate 0.0031 Epoch: 32 Global Step: 187630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:47,573-Speed 909.32 samples/sec Loss 4.4847 LearningRate 0.0031 Epoch: 33 Global Step: 187640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:48,604-Speed 9946.90 samples/sec Loss 4.0477 LearningRate 0.0031 Epoch: 33 Global Step: 187650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 05:53:49,549-Speed 10841.17 samples/sec Loss 4.2430 LearningRate 0.0031 Epoch: 33 Global Step: 187660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:50,874-Speed 7728.86 samples/sec Loss 4.3748 LearningRate 0.0031 Epoch: 33 Global Step: 187670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:53:51,777-Speed 11353.64 samples/sec Loss 4.1084 LearningRate 0.0031 Epoch: 33 Global Step: 187680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:52,871-Speed 9371.85 samples/sec Loss 4.1993 LearningRate 0.0031 Epoch: 33 Global Step: 187690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:53,859-Speed 10369.21 samples/sec Loss 4.1671 LearningRate 0.0031 Epoch: 33 Global Step: 187700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:54,813-Speed 10749.71 samples/sec Loss 4.2808 LearningRate 0.0031 Epoch: 33 Global Step: 187710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:55,734-Speed 11116.05 samples/sec Loss 4.2121 LearningRate 0.0030 Epoch: 33 Global Step: 187720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:56,703-Speed 10579.81 samples/sec Loss 4.2440 LearningRate 0.0030 Epoch: 33 Global Step: 187730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:57,668-Speed 10618.46 samples/sec Loss 4.2965 LearningRate 0.0030 Epoch: 33 Global Step: 187740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:58,621-Speed 10747.25 samples/sec Loss 4.3388 LearningRate 0.0030 Epoch: 33 Global Step: 187750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:53:59,555-Speed 10976.56 samples/sec Loss 4.1267 LearningRate 0.0030 Epoch: 33 Global Step: 187760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:00,484-Speed 11030.40 samples/sec Loss 4.1542 LearningRate 0.0030 Epoch: 33 Global Step: 187770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:01,436-Speed 10758.72 samples/sec Loss 4.1811 LearningRate 0.0030 Epoch: 33 Global Step: 187780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:02,405-Speed 10577.91 samples/sec Loss 4.2038 LearningRate 0.0030 Epoch: 33 Global Step: 187790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:03,343-Speed 10926.43 samples/sec Loss 4.1090 LearningRate 0.0030 Epoch: 33 Global Step: 187800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:04,293-Speed 10780.78 samples/sec Loss 4.1936 LearningRate 0.0030 Epoch: 33 Global Step: 187810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:05,239-Speed 10859.04 samples/sec Loss 4.2072 LearningRate 0.0030 Epoch: 33 Global Step: 187820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:06,210-Speed 10549.82 samples/sec Loss 4.3219 LearningRate 0.0030 Epoch: 33 Global Step: 187830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:07,171-Speed 10658.77 samples/sec Loss 4.2666 LearningRate 0.0030 Epoch: 33 Global Step: 187840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:08,128-Speed 10716.17 samples/sec Loss 4.2323 LearningRate 0.0030 Epoch: 33 Global Step: 187850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:09,088-Speed 10676.12 samples/sec Loss 4.2056 LearningRate 0.0030 Epoch: 33 Global Step: 187860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:10,009-Speed 11124.90 samples/sec Loss 4.1589 LearningRate 0.0030 Epoch: 33 Global Step: 187870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:10,969-Speed 10666.24 samples/sec Loss 4.2254 LearningRate 0.0030 Epoch: 33 Global Step: 187880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 05:54:11,932-Speed 10643.75 samples/sec Loss 4.2324 LearningRate 0.0030 Epoch: 33 Global Step: 187890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:12,893-Speed 10691.91 samples/sec Loss 4.2316 LearningRate 0.0030 Epoch: 33 Global Step: 187900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:13,845-Speed 10767.00 samples/sec Loss 4.1979 LearningRate 0.0030 Epoch: 33 Global Step: 187910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:54:14,839-Speed 10299.79 samples/sec Loss 4.1340 LearningRate 0.0030 Epoch: 33 Global Step: 187920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:15,765-Speed 11069.88 samples/sec Loss 4.2144 LearningRate 0.0030 Epoch: 33 Global Step: 187930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:16,762-Speed 10276.34 samples/sec Loss 4.2807 LearningRate 0.0030 Epoch: 33 Global Step: 187940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:17,763-Speed 10231.53 samples/sec Loss 4.2007 LearningRate 0.0030 Epoch: 33 Global Step: 187950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:18,714-Speed 10785.00 samples/sec Loss 4.3434 LearningRate 0.0030 Epoch: 33 Global Step: 187960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:19,664-Speed 10780.41 samples/sec Loss 4.3150 LearningRate 0.0030 Epoch: 33 Global Step: 187970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:20,634-Speed 10568.82 samples/sec Loss 4.2123 LearningRate 0.0030 Epoch: 33 Global Step: 187980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:21,560-Speed 11063.05 samples/sec Loss 4.2440 LearningRate 0.0030 Epoch: 33 Global Step: 187990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:22,485-Speed 11077.99 samples/sec Loss 4.3102 LearningRate 0.0030 Epoch: 33 Global Step: 188000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:54:43,741-[lfw][188000]XNorm: 7.858134 Training: 2022-04-11 05:54:43,742-[lfw][188000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-11 05:54:43,742-[lfw][188000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:55:08,387-[cfp_fp][188000]XNorm: 6.707240 Training: 2022-04-11 05:55:08,387-[cfp_fp][188000]Accuracy-Flip: 0.95686+-0.00905 Training: 2022-04-11 05:55:08,388-[cfp_fp][188000]Accuracy-Highest: 0.95686 Training: 2022-04-11 05:55:29,699-[agedb_30][188000]XNorm: 7.708471 Training: 2022-04-11 05:55:29,699-[agedb_30][188000]Accuracy-Flip: 0.96917+-0.00821 Training: 2022-04-11 05:55:29,700-[agedb_30][188000]Accuracy-Highest: 0.97067 Training: 2022-04-11 05:55:30,667-Speed 150.19 samples/sec Loss 4.3095 LearningRate 0.0030 Epoch: 33 Global Step: 188010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:31,618-Speed 10783.07 samples/sec Loss 4.2691 LearningRate 0.0030 Epoch: 33 Global Step: 188020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:32,581-Speed 10632.72 samples/sec Loss 4.1954 LearningRate 0.0030 Epoch: 33 Global Step: 188030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:33,522-Speed 10893.59 samples/sec Loss 4.2546 LearningRate 0.0030 Epoch: 33 Global Step: 188040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:34,444-Speed 11116.65 samples/sec Loss 4.2812 LearningRate 0.0030 Epoch: 33 Global Step: 188050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:35,357-Speed 11217.58 samples/sec Loss 4.2568 LearningRate 0.0030 Epoch: 33 Global Step: 188060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:36,324-Speed 10593.28 samples/sec Loss 4.2619 LearningRate 0.0030 Epoch: 33 Global Step: 188070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:37,260-Speed 10953.37 samples/sec Loss 4.1716 LearningRate 0.0030 Epoch: 33 Global Step: 188080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:38,220-Speed 10672.06 samples/sec Loss 4.2593 LearningRate 0.0030 Epoch: 33 Global Step: 188090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:39,134-Speed 11215.42 samples/sec Loss 4.2605 LearningRate 0.0030 Epoch: 33 Global Step: 188100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:40,095-Speed 10665.64 samples/sec Loss 4.3634 LearningRate 0.0030 Epoch: 33 Global Step: 188110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:41,013-Speed 11162.07 samples/sec Loss 4.2345 LearningRate 0.0030 Epoch: 33 Global Step: 188120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:41,948-Speed 10955.09 samples/sec Loss 4.2249 LearningRate 0.0030 Epoch: 33 Global Step: 188130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:42,934-Speed 10396.91 samples/sec Loss 4.2361 LearningRate 0.0030 Epoch: 33 Global Step: 188140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:43,911-Speed 10483.92 samples/sec Loss 4.1695 LearningRate 0.0030 Epoch: 33 Global Step: 188150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:44,868-Speed 10714.23 samples/sec Loss 4.2590 LearningRate 0.0030 Epoch: 33 Global Step: 188160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:45,817-Speed 10786.88 samples/sec Loss 4.2708 LearningRate 0.0030 Epoch: 33 Global Step: 188170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:46,781-Speed 10635.02 samples/sec Loss 4.2286 LearningRate 0.0030 Epoch: 33 Global Step: 188180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:47,720-Speed 10913.41 samples/sec Loss 4.2302 LearningRate 0.0030 Epoch: 33 Global Step: 188190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:48,653-Speed 10986.46 samples/sec Loss 4.3360 LearningRate 0.0030 Epoch: 33 Global Step: 188200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:49,591-Speed 10916.65 samples/sec Loss 4.2227 LearningRate 0.0030 Epoch: 33 Global Step: 188210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:55:50,516-Speed 11088.73 samples/sec Loss 4.2806 LearningRate 0.0030 Epoch: 33 Global Step: 188220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:51,451-Speed 10952.32 samples/sec Loss 4.2326 LearningRate 0.0030 Epoch: 33 Global Step: 188230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:52,391-Speed 10901.70 samples/sec Loss 4.2359 LearningRate 0.0030 Epoch: 33 Global Step: 188240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:53,314-Speed 11097.57 samples/sec Loss 4.2289 LearningRate 0.0030 Epoch: 33 Global Step: 188250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:54,260-Speed 10833.63 samples/sec Loss 4.4010 LearningRate 0.0030 Epoch: 33 Global Step: 188260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:55,181-Speed 11134.58 samples/sec Loss 4.2850 LearningRate 0.0030 Epoch: 33 Global Step: 188270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:56,124-Speed 10863.07 samples/sec Loss 4.2599 LearningRate 0.0030 Epoch: 33 Global Step: 188280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:57,082-Speed 10696.62 samples/sec Loss 4.2423 LearningRate 0.0030 Epoch: 33 Global Step: 188290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:58,003-Speed 11123.29 samples/sec Loss 4.2246 LearningRate 0.0030 Epoch: 33 Global Step: 188300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:58,955-Speed 10769.02 samples/sec Loss 4.2798 LearningRate 0.0030 Epoch: 33 Global Step: 188310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:55:59,948-Speed 10312.69 samples/sec Loss 4.1754 LearningRate 0.0030 Epoch: 33 Global Step: 188320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:00,975-Speed 9978.36 samples/sec Loss 4.2401 LearningRate 0.0030 Epoch: 33 Global Step: 188330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:01,952-Speed 10491.77 samples/sec Loss 4.2912 LearningRate 0.0030 Epoch: 33 Global Step: 188340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:02,911-Speed 10685.65 samples/sec Loss 4.0546 LearningRate 0.0030 Epoch: 33 Global Step: 188350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:03,837-Speed 11061.80 samples/sec Loss 4.3499 LearningRate 0.0030 Epoch: 33 Global Step: 188360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:04,846-Speed 10161.49 samples/sec Loss 4.3067 LearningRate 0.0030 Epoch: 33 Global Step: 188370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:05,816-Speed 10558.75 samples/sec Loss 4.3634 LearningRate 0.0029 Epoch: 33 Global Step: 188380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:06,812-Speed 10285.03 samples/sec Loss 4.3183 LearningRate 0.0029 Epoch: 33 Global Step: 188390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:07,760-Speed 10811.04 samples/sec Loss 4.3579 LearningRate 0.0029 Epoch: 33 Global Step: 188400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:08,712-Speed 10761.69 samples/sec Loss 4.2988 LearningRate 0.0029 Epoch: 33 Global Step: 188410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:09,657-Speed 10850.13 samples/sec Loss 4.2047 LearningRate 0.0029 Epoch: 33 Global Step: 188420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:10,580-Speed 11104.41 samples/sec Loss 4.2724 LearningRate 0.0029 Epoch: 33 Global Step: 188430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:11,549-Speed 10564.80 samples/sec Loss 4.3082 LearningRate 0.0029 Epoch: 33 Global Step: 188440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:12,509-Speed 10674.20 samples/sec Loss 4.3059 LearningRate 0.0029 Epoch: 33 Global Step: 188450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:13,491-Speed 10440.20 samples/sec Loss 4.2703 LearningRate 0.0029 Epoch: 33 Global Step: 188460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:14,443-Speed 10770.46 samples/sec Loss 4.2949 LearningRate 0.0029 Epoch: 33 Global Step: 188470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:15,363-Speed 11146.33 samples/sec Loss 4.2565 LearningRate 0.0029 Epoch: 33 Global Step: 188480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:16,330-Speed 10594.43 samples/sec Loss 4.1024 LearningRate 0.0029 Epoch: 33 Global Step: 188490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:17,284-Speed 10747.42 samples/sec Loss 4.2797 LearningRate 0.0029 Epoch: 33 Global Step: 188500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:18,217-Speed 10978.16 samples/sec Loss 4.2319 LearningRate 0.0029 Epoch: 33 Global Step: 188510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:19,173-Speed 10718.92 samples/sec Loss 4.2952 LearningRate 0.0029 Epoch: 33 Global Step: 188520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:20,096-Speed 11104.11 samples/sec Loss 4.3517 LearningRate 0.0029 Epoch: 33 Global Step: 188530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:21,052-Speed 10711.62 samples/sec Loss 4.2398 LearningRate 0.0029 Epoch: 33 Global Step: 188540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:21,975-Speed 11102.09 samples/sec Loss 4.3093 LearningRate 0.0029 Epoch: 33 Global Step: 188550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:22,918-Speed 10871.36 samples/sec Loss 4.2477 LearningRate 0.0029 Epoch: 33 Global Step: 188560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:23,831-Speed 11227.68 samples/sec Loss 4.2021 LearningRate 0.0029 Epoch: 33 Global Step: 188570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:24,785-Speed 10733.00 samples/sec Loss 4.3050 LearningRate 0.0029 Epoch: 33 Global Step: 188580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:25,709-Speed 11089.64 samples/sec Loss 4.2814 LearningRate 0.0029 Epoch: 33 Global Step: 188590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:26,687-Speed 10483.51 samples/sec Loss 4.4479 LearningRate 0.0029 Epoch: 33 Global Step: 188600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:27,614-Speed 11044.95 samples/sec Loss 4.3370 LearningRate 0.0029 Epoch: 33 Global Step: 188610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:28,564-Speed 10789.39 samples/sec Loss 4.2542 LearningRate 0.0029 Epoch: 33 Global Step: 188620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:29,496-Speed 10993.74 samples/sec Loss 4.1837 LearningRate 0.0029 Epoch: 33 Global Step: 188630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 05:56:30,460-Speed 10628.50 samples/sec Loss 4.3546 LearningRate 0.0029 Epoch: 33 Global Step: 188640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:31,443-Speed 10431.23 samples/sec Loss 4.3374 LearningRate 0.0029 Epoch: 33 Global Step: 188650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:32,384-Speed 10886.45 samples/sec Loss 4.3103 LearningRate 0.0029 Epoch: 33 Global Step: 188660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:33,336-Speed 10762.56 samples/sec Loss 4.2651 LearningRate 0.0029 Epoch: 33 Global Step: 188670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:34,257-Speed 11128.53 samples/sec Loss 4.3417 LearningRate 0.0029 Epoch: 33 Global Step: 188680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:35,197-Speed 10900.90 samples/sec Loss 4.2311 LearningRate 0.0029 Epoch: 33 Global Step: 188690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:36,112-Speed 11195.01 samples/sec Loss 4.1098 LearningRate 0.0029 Epoch: 33 Global Step: 188700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:37,054-Speed 10884.69 samples/sec Loss 4.3401 LearningRate 0.0029 Epoch: 33 Global Step: 188710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:38,000-Speed 10836.07 samples/sec Loss 4.2877 LearningRate 0.0029 Epoch: 33 Global Step: 188720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:38,979-Speed 10460.52 samples/sec Loss 4.3010 LearningRate 0.0029 Epoch: 33 Global Step: 188730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:39,936-Speed 10714.75 samples/sec Loss 4.3146 LearningRate 0.0029 Epoch: 33 Global Step: 188740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:40,893-Speed 10697.76 samples/sec Loss 4.3939 LearningRate 0.0029 Epoch: 33 Global Step: 188750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:41,815-Speed 11115.23 samples/sec Loss 4.3029 LearningRate 0.0029 Epoch: 33 Global Step: 188760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:42,770-Speed 10734.39 samples/sec Loss 4.3563 LearningRate 0.0029 Epoch: 33 Global Step: 188770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:43,710-Speed 10893.12 samples/sec Loss 4.3076 LearningRate 0.0029 Epoch: 33 Global Step: 188780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:44,659-Speed 10805.15 samples/sec Loss 4.2970 LearningRate 0.0029 Epoch: 33 Global Step: 188790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:45,614-Speed 10726.75 samples/sec Loss 4.3240 LearningRate 0.0029 Epoch: 33 Global Step: 188800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:46,519-Speed 11328.93 samples/sec Loss 4.3449 LearningRate 0.0029 Epoch: 33 Global Step: 188810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:47,445-Speed 11059.17 samples/sec Loss 4.3292 LearningRate 0.0029 Epoch: 33 Global Step: 188820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:48,398-Speed 10749.40 samples/sec Loss 4.3593 LearningRate 0.0029 Epoch: 33 Global Step: 188830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:49,345-Speed 10826.38 samples/sec Loss 4.2694 LearningRate 0.0029 Epoch: 33 Global Step: 188840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:50,309-Speed 10631.02 samples/sec Loss 4.2647 LearningRate 0.0029 Epoch: 33 Global Step: 188850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:51,258-Speed 10788.88 samples/sec Loss 4.2244 LearningRate 0.0029 Epoch: 33 Global Step: 188860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:52,242-Speed 10413.38 samples/sec Loss 4.2893 LearningRate 0.0029 Epoch: 33 Global Step: 188870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:53,204-Speed 10652.64 samples/sec Loss 4.3347 LearningRate 0.0029 Epoch: 33 Global Step: 188880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:56:54,106-Speed 11364.12 samples/sec Loss 4.2250 LearningRate 0.0029 Epoch: 33 Global Step: 188890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:55,071-Speed 10617.96 samples/sec Loss 4.2896 LearningRate 0.0029 Epoch: 33 Global Step: 188900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:56,006-Speed 10960.76 samples/sec Loss 4.2659 LearningRate 0.0029 Epoch: 33 Global Step: 188910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:57,004-Speed 10270.61 samples/sec Loss 4.2803 LearningRate 0.0029 Epoch: 33 Global Step: 188920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:57,975-Speed 10556.17 samples/sec Loss 4.3237 LearningRate 0.0029 Epoch: 33 Global Step: 188930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:58,890-Speed 11203.35 samples/sec Loss 4.3172 LearningRate 0.0029 Epoch: 33 Global Step: 188940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:56:59,830-Speed 10902.95 samples/sec Loss 4.3095 LearningRate 0.0029 Epoch: 33 Global Step: 188950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:00,798-Speed 10585.30 samples/sec Loss 4.3087 LearningRate 0.0029 Epoch: 33 Global Step: 188960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:01,743-Speed 10832.97 samples/sec Loss 4.2354 LearningRate 0.0029 Epoch: 33 Global Step: 188970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:02,681-Speed 10929.80 samples/sec Loss 4.3250 LearningRate 0.0029 Epoch: 33 Global Step: 188980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:03,608-Speed 11052.08 samples/sec Loss 4.4198 LearningRate 0.0029 Epoch: 33 Global Step: 188990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:04,523-Speed 11204.32 samples/sec Loss 4.3544 LearningRate 0.0029 Epoch: 33 Global Step: 189000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:05,487-Speed 10629.33 samples/sec Loss 4.3274 LearningRate 0.0029 Epoch: 33 Global Step: 189010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:06,405-Speed 11161.61 samples/sec Loss 4.3436 LearningRate 0.0029 Epoch: 33 Global Step: 189020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:07,413-Speed 10163.88 samples/sec Loss 4.3046 LearningRate 0.0029 Epoch: 33 Global Step: 189030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:08,421-Speed 10169.79 samples/sec Loss 4.2738 LearningRate 0.0029 Epoch: 33 Global Step: 189040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:09,340-Speed 11146.77 samples/sec Loss 4.2686 LearningRate 0.0028 Epoch: 33 Global Step: 189050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:10,293-Speed 10750.70 samples/sec Loss 4.3179 LearningRate 0.0028 Epoch: 33 Global Step: 189060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:11,230-Speed 10935.90 samples/sec Loss 4.2693 LearningRate 0.0028 Epoch: 33 Global Step: 189070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:12,167-Speed 10934.04 samples/sec Loss 4.2344 LearningRate 0.0028 Epoch: 33 Global Step: 189080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:13,124-Speed 10708.59 samples/sec Loss 4.3455 LearningRate 0.0028 Epoch: 33 Global Step: 189090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:14,039-Speed 11200.92 samples/sec Loss 4.2761 LearningRate 0.0028 Epoch: 33 Global Step: 189100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:15,041-Speed 10233.02 samples/sec Loss 4.2494 LearningRate 0.0028 Epoch: 33 Global Step: 189110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:15,988-Speed 10812.78 samples/sec Loss 4.2885 LearningRate 0.0028 Epoch: 33 Global Step: 189120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:16,944-Speed 10718.41 samples/sec Loss 4.3024 LearningRate 0.0028 Epoch: 33 Global Step: 189130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:17,867-Speed 11103.29 samples/sec Loss 4.3464 LearningRate 0.0028 Epoch: 33 Global Step: 189140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:18,802-Speed 10973.24 samples/sec Loss 4.2849 LearningRate 0.0028 Epoch: 33 Global Step: 189150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:19,725-Speed 11104.10 samples/sec Loss 4.4245 LearningRate 0.0028 Epoch: 33 Global Step: 189160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:20,645-Speed 11133.57 samples/sec Loss 4.2505 LearningRate 0.0028 Epoch: 33 Global Step: 189170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:21,585-Speed 10895.78 samples/sec Loss 4.2133 LearningRate 0.0028 Epoch: 33 Global Step: 189180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:22,496-Speed 11253.87 samples/sec Loss 4.2608 LearningRate 0.0028 Epoch: 33 Global Step: 189190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:23,414-Speed 11165.63 samples/sec Loss 4.2236 LearningRate 0.0028 Epoch: 33 Global Step: 189200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:24,331-Speed 11170.84 samples/sec Loss 4.2812 LearningRate 0.0028 Epoch: 33 Global Step: 189210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:25,253-Speed 11111.41 samples/sec Loss 4.3152 LearningRate 0.0028 Epoch: 33 Global Step: 189220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:26,174-Speed 11123.63 samples/sec Loss 4.3467 LearningRate 0.0028 Epoch: 33 Global Step: 189230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:27,101-Speed 11053.12 samples/sec Loss 4.2856 LearningRate 0.0028 Epoch: 33 Global Step: 189240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:28,021-Speed 11140.57 samples/sec Loss 4.4658 LearningRate 0.0028 Epoch: 33 Global Step: 189250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:28,991-Speed 10568.36 samples/sec Loss 4.4556 LearningRate 0.0028 Epoch: 33 Global Step: 189260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:29,909-Speed 11158.47 samples/sec Loss 4.3569 LearningRate 0.0028 Epoch: 33 Global Step: 189270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:30,889-Speed 10457.45 samples/sec Loss 4.2667 LearningRate 0.0028 Epoch: 33 Global Step: 189280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:31,869-Speed 10452.29 samples/sec Loss 4.2548 LearningRate 0.0028 Epoch: 33 Global Step: 189290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:32,823-Speed 10739.00 samples/sec Loss 4.3181 LearningRate 0.0028 Epoch: 33 Global Step: 189300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:33,750-Speed 11052.90 samples/sec Loss 4.3253 LearningRate 0.0028 Epoch: 33 Global Step: 189310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:34,671-Speed 11129.48 samples/sec Loss 4.3227 LearningRate 0.0028 Epoch: 33 Global Step: 189320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:35,591-Speed 11142.38 samples/sec Loss 4.2433 LearningRate 0.0028 Epoch: 33 Global Step: 189330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:36,520-Speed 11032.82 samples/sec Loss 4.3201 LearningRate 0.0028 Epoch: 33 Global Step: 189340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:37,471-Speed 10772.40 samples/sec Loss 4.3903 LearningRate 0.0028 Epoch: 33 Global Step: 189350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:38,431-Speed 10672.70 samples/sec Loss 4.3368 LearningRate 0.0028 Epoch: 33 Global Step: 189360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:39,365-Speed 10973.73 samples/sec Loss 4.3377 LearningRate 0.0028 Epoch: 33 Global Step: 189370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:40,293-Speed 11048.69 samples/sec Loss 4.4404 LearningRate 0.0028 Epoch: 33 Global Step: 189380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:41,216-Speed 11107.37 samples/sec Loss 4.3610 LearningRate 0.0028 Epoch: 33 Global Step: 189390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:42,134-Speed 11160.71 samples/sec Loss 4.4967 LearningRate 0.0028 Epoch: 33 Global Step: 189400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:43,089-Speed 10729.88 samples/sec Loss 4.2412 LearningRate 0.0028 Epoch: 33 Global Step: 189410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:44,074-Speed 10407.49 samples/sec Loss 4.3145 LearningRate 0.0028 Epoch: 33 Global Step: 189420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:45,017-Speed 10861.69 samples/sec Loss 4.3667 LearningRate 0.0028 Epoch: 33 Global Step: 189430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:45,950-Speed 10989.07 samples/sec Loss 4.3515 LearningRate 0.0028 Epoch: 33 Global Step: 189440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:46,874-Speed 11079.47 samples/sec Loss 4.3857 LearningRate 0.0028 Epoch: 33 Global Step: 189450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:47,832-Speed 10694.68 samples/sec Loss 4.3773 LearningRate 0.0028 Epoch: 33 Global Step: 189460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:48,760-Speed 11051.76 samples/sec Loss 4.3307 LearningRate 0.0028 Epoch: 33 Global Step: 189470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:49,759-Speed 10253.73 samples/sec Loss 4.3643 LearningRate 0.0028 Epoch: 33 Global Step: 189480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:50,695-Speed 10942.73 samples/sec Loss 4.3290 LearningRate 0.0028 Epoch: 33 Global Step: 189490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:51,632-Speed 10941.09 samples/sec Loss 4.2692 LearningRate 0.0028 Epoch: 33 Global Step: 189500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:52,545-Speed 11225.39 samples/sec Loss 4.3047 LearningRate 0.0028 Epoch: 33 Global Step: 189510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:53,465-Speed 11135.45 samples/sec Loss 4.3058 LearningRate 0.0028 Epoch: 33 Global Step: 189520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:54,392-Speed 11053.73 samples/sec Loss 4.2527 LearningRate 0.0028 Epoch: 33 Global Step: 189530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:57:55,339-Speed 10822.25 samples/sec Loss 4.3376 LearningRate 0.0028 Epoch: 33 Global Step: 189540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 05:57:56,238-Speed 11394.34 samples/sec Loss 4.4065 LearningRate 0.0028 Epoch: 33 Global Step: 189550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:57,166-Speed 11046.09 samples/sec Loss 4.3077 LearningRate 0.0028 Epoch: 33 Global Step: 189560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:58,097-Speed 11009.18 samples/sec Loss 4.2501 LearningRate 0.0028 Epoch: 33 Global Step: 189570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:57:59,046-Speed 10797.10 samples/sec Loss 4.4220 LearningRate 0.0028 Epoch: 33 Global Step: 189580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:00,001-Speed 10722.84 samples/sec Loss 4.2225 LearningRate 0.0028 Epoch: 33 Global Step: 189590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:00,930-Speed 11032.46 samples/sec Loss 4.4391 LearningRate 0.0028 Epoch: 33 Global Step: 189600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:01,879-Speed 10806.12 samples/sec Loss 4.3802 LearningRate 0.0028 Epoch: 33 Global Step: 189610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:02,861-Speed 10425.31 samples/sec Loss 4.4023 LearningRate 0.0028 Epoch: 33 Global Step: 189620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:03,827-Speed 10617.88 samples/sec Loss 4.2691 LearningRate 0.0028 Epoch: 33 Global Step: 189630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:04,736-Speed 11274.66 samples/sec Loss 4.3112 LearningRate 0.0028 Epoch: 33 Global Step: 189640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:05,688-Speed 10765.34 samples/sec Loss 4.4704 LearningRate 0.0028 Epoch: 33 Global Step: 189650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:06,611-Speed 11099.86 samples/sec Loss 4.3111 LearningRate 0.0028 Epoch: 33 Global Step: 189660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:07,536-Speed 11072.04 samples/sec Loss 4.3949 LearningRate 0.0028 Epoch: 33 Global Step: 189670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:08,463-Speed 11059.93 samples/sec Loss 4.3400 LearningRate 0.0028 Epoch: 33 Global Step: 189680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:09,449-Speed 10387.44 samples/sec Loss 4.3905 LearningRate 0.0028 Epoch: 33 Global Step: 189690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:10,393-Speed 10855.03 samples/sec Loss 4.3017 LearningRate 0.0028 Epoch: 33 Global Step: 189700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:11,368-Speed 10510.50 samples/sec Loss 4.3982 LearningRate 0.0028 Epoch: 33 Global Step: 189710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:12,305-Speed 10932.12 samples/sec Loss 4.3431 LearningRate 0.0028 Epoch: 33 Global Step: 189720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:13,265-Speed 10673.88 samples/sec Loss 4.3389 LearningRate 0.0027 Epoch: 33 Global Step: 189730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:14,189-Speed 11101.18 samples/sec Loss 4.4226 LearningRate 0.0027 Epoch: 33 Global Step: 189740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:15,173-Speed 10414.28 samples/sec Loss 4.2694 LearningRate 0.0027 Epoch: 33 Global Step: 189750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:16,125-Speed 10761.99 samples/sec Loss 4.3872 LearningRate 0.0027 Epoch: 33 Global Step: 189760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:17,047-Speed 11112.67 samples/sec Loss 4.5021 LearningRate 0.0027 Epoch: 33 Global Step: 189770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:17,958-Speed 11251.87 samples/sec Loss 4.2916 LearningRate 0.0027 Epoch: 33 Global Step: 189780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:18,890-Speed 10994.73 samples/sec Loss 4.3700 LearningRate 0.0027 Epoch: 33 Global Step: 189790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:19,858-Speed 10583.78 samples/sec Loss 4.3869 LearningRate 0.0027 Epoch: 33 Global Step: 189800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:20,790-Speed 10993.64 samples/sec Loss 4.3669 LearningRate 0.0027 Epoch: 33 Global Step: 189810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:21,726-Speed 10944.97 samples/sec Loss 4.4481 LearningRate 0.0027 Epoch: 33 Global Step: 189820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:22,685-Speed 10684.43 samples/sec Loss 4.2703 LearningRate 0.0027 Epoch: 33 Global Step: 189830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:23,638-Speed 10762.28 samples/sec Loss 4.4292 LearningRate 0.0027 Epoch: 33 Global Step: 189840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:24,586-Speed 10803.65 samples/sec Loss 4.3279 LearningRate 0.0027 Epoch: 33 Global Step: 189850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:25,519-Speed 10989.51 samples/sec Loss 4.3232 LearningRate 0.0027 Epoch: 33 Global Step: 189860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:26,488-Speed 10572.85 samples/sec Loss 4.3197 LearningRate 0.0027 Epoch: 33 Global Step: 189870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:27,429-Speed 10884.24 samples/sec Loss 4.3311 LearningRate 0.0027 Epoch: 33 Global Step: 189880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:28,361-Speed 10998.42 samples/sec Loss 4.4766 LearningRate 0.0027 Epoch: 33 Global Step: 189890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:29,332-Speed 10562.42 samples/sec Loss 4.3780 LearningRate 0.0027 Epoch: 33 Global Step: 189900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 05:58:30,283-Speed 10772.97 samples/sec Loss 4.2921 LearningRate 0.0027 Epoch: 33 Global Step: 189910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:31,237-Speed 10743.91 samples/sec Loss 4.3850 LearningRate 0.0027 Epoch: 33 Global Step: 189920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:32,180-Speed 10865.50 samples/sec Loss 4.3516 LearningRate 0.0027 Epoch: 33 Global Step: 189930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:33,092-Speed 11234.44 samples/sec Loss 4.3169 LearningRate 0.0027 Epoch: 33 Global Step: 189940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:34,032-Speed 10903.15 samples/sec Loss 4.2477 LearningRate 0.0027 Epoch: 33 Global Step: 189950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:34,971-Speed 10913.97 samples/sec Loss 4.4402 LearningRate 0.0027 Epoch: 33 Global Step: 189960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:35,884-Speed 11219.21 samples/sec Loss 4.5359 LearningRate 0.0027 Epoch: 33 Global Step: 189970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:36,826-Speed 10880.54 samples/sec Loss 4.3684 LearningRate 0.0027 Epoch: 33 Global Step: 189980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:37,763-Speed 10931.03 samples/sec Loss 4.3287 LearningRate 0.0027 Epoch: 33 Global Step: 189990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:58:38,741-Speed 10478.80 samples/sec Loss 4.5207 LearningRate 0.0027 Epoch: 33 Global Step: 190000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:00,032-[lfw][190000]XNorm: 7.817635 Training: 2022-04-11 05:59:00,033-[lfw][190000]Accuracy-Flip: 0.99717+-0.00248 Training: 2022-04-11 05:59:00,033-[lfw][190000]Accuracy-Highest: 0.99733 Training: 2022-04-11 05:59:24,697-[cfp_fp][190000]XNorm: 6.682699 Training: 2022-04-11 05:59:24,697-[cfp_fp][190000]Accuracy-Flip: 0.95500+-0.01145 Training: 2022-04-11 05:59:24,698-[cfp_fp][190000]Accuracy-Highest: 0.95686 Training: 2022-04-11 05:59:45,979-[agedb_30][190000]XNorm: 7.642092 Training: 2022-04-11 05:59:45,980-[agedb_30][190000]Accuracy-Flip: 0.96850+-0.00762 Training: 2022-04-11 05:59:45,980-[agedb_30][190000]Accuracy-Highest: 0.97067 Training: 2022-04-11 05:59:46,905-Speed 150.23 samples/sec Loss 4.3331 LearningRate 0.0027 Epoch: 33 Global Step: 190010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:59:47,840-Speed 10948.65 samples/sec Loss 4.2481 LearningRate 0.0027 Epoch: 33 Global Step: 190020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:59:48,801-Speed 10666.69 samples/sec Loss 4.3269 LearningRate 0.0027 Epoch: 33 Global Step: 190030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 05:59:49,730-Speed 11032.40 samples/sec Loss 4.3309 LearningRate 0.0027 Epoch: 33 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:50,654-Speed 11091.28 samples/sec Loss 4.3385 LearningRate 0.0027 Epoch: 33 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:51,643-Speed 10353.69 samples/sec Loss 4.3348 LearningRate 0.0027 Epoch: 33 Global Step: 190060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:52,630-Speed 10382.89 samples/sec Loss 4.3080 LearningRate 0.0027 Epoch: 33 Global Step: 190070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:53,566-Speed 10954.32 samples/sec Loss 4.3284 LearningRate 0.0027 Epoch: 33 Global Step: 190080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:54,494-Speed 11036.20 samples/sec Loss 4.3232 LearningRate 0.0027 Epoch: 33 Global Step: 190090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:55,415-Speed 11130.71 samples/sec Loss 4.3320 LearningRate 0.0027 Epoch: 33 Global Step: 190100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:56,335-Speed 11130.22 samples/sec Loss 4.3297 LearningRate 0.0027 Epoch: 33 Global Step: 190110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:57,241-Speed 11316.37 samples/sec Loss 4.4963 LearningRate 0.0027 Epoch: 33 Global Step: 190120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:58,193-Speed 10759.37 samples/sec Loss 4.3890 LearningRate 0.0027 Epoch: 33 Global Step: 190130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 05:59:59,106-Speed 11247.01 samples/sec Loss 4.4348 LearningRate 0.0027 Epoch: 33 Global Step: 190140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:00,044-Speed 10924.72 samples/sec Loss 4.3213 LearningRate 0.0027 Epoch: 33 Global Step: 190150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:00,970-Speed 11057.35 samples/sec Loss 4.2867 LearningRate 0.0027 Epoch: 33 Global Step: 190160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:01,908-Speed 10931.65 samples/sec Loss 4.4009 LearningRate 0.0027 Epoch: 33 Global Step: 190170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:02,833-Speed 11070.22 samples/sec Loss 4.2167 LearningRate 0.0027 Epoch: 33 Global Step: 190180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:03,800-Speed 10603.40 samples/sec Loss 4.2117 LearningRate 0.0027 Epoch: 33 Global Step: 190190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:04,738-Speed 10916.48 samples/sec Loss 4.4281 LearningRate 0.0027 Epoch: 33 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:05,687-Speed 10804.01 samples/sec Loss 4.4329 LearningRate 0.0027 Epoch: 33 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:06,635-Speed 10813.33 samples/sec Loss 4.4272 LearningRate 0.0027 Epoch: 33 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:07,565-Speed 11019.50 samples/sec Loss 4.2539 LearningRate 0.0027 Epoch: 33 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:08,571-Speed 10187.56 samples/sec Loss 4.3620 LearningRate 0.0027 Epoch: 33 Global Step: 190240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:09,522-Speed 10773.67 samples/sec Loss 4.4013 LearningRate 0.0027 Epoch: 33 Global Step: 190250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:10,480-Speed 10691.41 samples/sec Loss 4.3025 LearningRate 0.0027 Epoch: 33 Global Step: 190260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:11,431-Speed 10774.33 samples/sec Loss 4.3004 LearningRate 0.0027 Epoch: 33 Global Step: 190270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:12,340-Speed 11284.20 samples/sec Loss 4.3908 LearningRate 0.0027 Epoch: 33 Global Step: 190280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:13,270-Speed 11008.25 samples/sec Loss 4.4296 LearningRate 0.0027 Epoch: 33 Global Step: 190290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:14,217-Speed 10826.25 samples/sec Loss 4.3608 LearningRate 0.0027 Epoch: 33 Global Step: 190300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:15,137-Speed 11136.18 samples/sec Loss 4.2764 LearningRate 0.0027 Epoch: 33 Global Step: 190310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:16,077-Speed 10904.07 samples/sec Loss 4.3801 LearningRate 0.0027 Epoch: 33 Global Step: 190320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:16,999-Speed 11113.36 samples/sec Loss 4.3592 LearningRate 0.0027 Epoch: 33 Global Step: 190330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:17,972-Speed 10532.79 samples/sec Loss 4.3333 LearningRate 0.0027 Epoch: 33 Global Step: 190340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 06:00:18,919-Speed 10813.16 samples/sec Loss 4.3791 LearningRate 0.0027 Epoch: 33 Global Step: 190350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:19,887-Speed 10586.78 samples/sec Loss 4.3514 LearningRate 0.0027 Epoch: 33 Global Step: 190360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:20,826-Speed 10920.52 samples/sec Loss 4.3826 LearningRate 0.0027 Epoch: 33 Global Step: 190370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:21,739-Speed 11226.49 samples/sec Loss 4.3497 LearningRate 0.0027 Epoch: 33 Global Step: 190380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:22,638-Speed 11397.73 samples/sec Loss 4.4084 LearningRate 0.0027 Epoch: 33 Global Step: 190390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:23,563-Speed 11077.55 samples/sec Loss 4.4403 LearningRate 0.0027 Epoch: 33 Global Step: 190400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:24,526-Speed 10632.62 samples/sec Loss 4.4355 LearningRate 0.0027 Epoch: 33 Global Step: 190410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:25,456-Speed 11029.60 samples/sec Loss 4.4181 LearningRate 0.0026 Epoch: 33 Global Step: 190420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:26,386-Speed 11016.91 samples/sec Loss 4.4774 LearningRate 0.0026 Epoch: 33 Global Step: 190430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:27,308-Speed 11109.50 samples/sec Loss 4.4115 LearningRate 0.0026 Epoch: 33 Global Step: 190440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:28,234-Speed 11072.03 samples/sec Loss 4.4843 LearningRate 0.0026 Epoch: 33 Global Step: 190450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:29,198-Speed 10622.11 samples/sec Loss 4.5007 LearningRate 0.0026 Epoch: 33 Global Step: 190460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:30,120-Speed 11119.81 samples/sec Loss 4.2759 LearningRate 0.0026 Epoch: 33 Global Step: 190470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:31,118-Speed 10270.62 samples/sec Loss 4.3912 LearningRate 0.0026 Epoch: 33 Global Step: 190480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:32,039-Speed 11119.85 samples/sec Loss 4.3639 LearningRate 0.0026 Epoch: 33 Global Step: 190490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:33,013-Speed 10523.51 samples/sec Loss 4.3695 LearningRate 0.0026 Epoch: 33 Global Step: 190500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:33,948-Speed 10958.35 samples/sec Loss 4.4519 LearningRate 0.0026 Epoch: 33 Global Step: 190510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:34,877-Speed 11030.16 samples/sec Loss 4.4049 LearningRate 0.0026 Epoch: 33 Global Step: 190520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:35,783-Speed 11309.09 samples/sec Loss 4.4188 LearningRate 0.0026 Epoch: 33 Global Step: 190530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:36,731-Speed 10813.31 samples/sec Loss 4.4243 LearningRate 0.0026 Epoch: 33 Global Step: 190540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:37,694-Speed 10641.63 samples/sec Loss 4.3250 LearningRate 0.0026 Epoch: 33 Global Step: 190550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:38,651-Speed 10702.71 samples/sec Loss 4.3888 LearningRate 0.0026 Epoch: 33 Global Step: 190560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:39,589-Speed 10929.56 samples/sec Loss 4.3869 LearningRate 0.0026 Epoch: 33 Global Step: 190570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:40,520-Speed 11005.79 samples/sec Loss 4.3066 LearningRate 0.0026 Epoch: 33 Global Step: 190580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:41,508-Speed 10366.37 samples/sec Loss 4.3010 LearningRate 0.0026 Epoch: 33 Global Step: 190590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:42,436-Speed 11042.11 samples/sec Loss 4.2682 LearningRate 0.0026 Epoch: 33 Global Step: 190600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:43,368-Speed 10995.75 samples/sec Loss 4.4298 LearningRate 0.0026 Epoch: 33 Global Step: 190610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:44,333-Speed 10619.94 samples/sec Loss 4.3277 LearningRate 0.0026 Epoch: 33 Global Step: 190620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:45,268-Speed 10961.60 samples/sec Loss 4.4448 LearningRate 0.0026 Epoch: 33 Global Step: 190630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:46,204-Speed 10945.97 samples/sec Loss 4.3900 LearningRate 0.0026 Epoch: 33 Global Step: 190640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:47,157-Speed 10749.68 samples/sec Loss 4.3701 LearningRate 0.0026 Epoch: 33 Global Step: 190650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:48,123-Speed 10610.32 samples/sec Loss 4.3596 LearningRate 0.0026 Epoch: 33 Global Step: 190660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:49,108-Speed 10404.51 samples/sec Loss 4.3800 LearningRate 0.0026 Epoch: 33 Global Step: 190670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:50,054-Speed 10826.48 samples/sec Loss 4.5095 LearningRate 0.0026 Epoch: 33 Global Step: 190680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:50,987-Speed 10980.40 samples/sec Loss 4.3707 LearningRate 0.0026 Epoch: 33 Global Step: 190690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:00:51,949-Speed 10660.98 samples/sec Loss 4.2805 LearningRate 0.0026 Epoch: 33 Global Step: 190700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:52,873-Speed 11089.03 samples/sec Loss 4.3873 LearningRate 0.0026 Epoch: 33 Global Step: 190710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:53,853-Speed 10458.90 samples/sec Loss 4.3874 LearningRate 0.0026 Epoch: 33 Global Step: 190720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:54,796-Speed 10862.17 samples/sec Loss 4.3459 LearningRate 0.0026 Epoch: 33 Global Step: 190730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:55,744-Speed 10807.84 samples/sec Loss 4.4771 LearningRate 0.0026 Epoch: 33 Global Step: 190740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:56,706-Speed 10653.64 samples/sec Loss 4.2913 LearningRate 0.0026 Epoch: 33 Global Step: 190750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:57,638-Speed 10989.01 samples/sec Loss 4.4227 LearningRate 0.0026 Epoch: 33 Global Step: 190760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:58,563-Speed 11083.07 samples/sec Loss 4.5072 LearningRate 0.0026 Epoch: 33 Global Step: 190770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:00:59,452-Speed 11531.20 samples/sec Loss 4.3674 LearningRate 0.0026 Epoch: 33 Global Step: 190780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:00,411-Speed 10683.15 samples/sec Loss 4.4358 LearningRate 0.0026 Epoch: 33 Global Step: 190790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:01,398-Speed 10387.57 samples/sec Loss 4.4101 LearningRate 0.0026 Epoch: 33 Global Step: 190800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:02,395-Speed 10276.46 samples/sec Loss 4.4226 LearningRate 0.0026 Epoch: 33 Global Step: 190810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:03,359-Speed 10632.53 samples/sec Loss 4.3931 LearningRate 0.0026 Epoch: 33 Global Step: 190820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:04,285-Speed 11059.46 samples/sec Loss 4.5127 LearningRate 0.0026 Epoch: 33 Global Step: 190830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:05,216-Speed 11018.60 samples/sec Loss 4.4324 LearningRate 0.0026 Epoch: 33 Global Step: 190840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:06,136-Speed 11129.02 samples/sec Loss 4.3667 LearningRate 0.0026 Epoch: 33 Global Step: 190850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:07,104-Speed 10591.92 samples/sec Loss 4.4391 LearningRate 0.0026 Epoch: 33 Global Step: 190860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:08,049-Speed 10849.66 samples/sec Loss 4.4500 LearningRate 0.0026 Epoch: 33 Global Step: 190870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:09,023-Speed 10512.87 samples/sec Loss 4.3736 LearningRate 0.0026 Epoch: 33 Global Step: 190880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:09,957-Speed 10971.84 samples/sec Loss 4.4969 LearningRate 0.0026 Epoch: 33 Global Step: 190890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:10,881-Speed 11098.01 samples/sec Loss 4.4413 LearningRate 0.0026 Epoch: 33 Global Step: 190900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:11,794-Speed 11219.36 samples/sec Loss 4.3091 LearningRate 0.0026 Epoch: 33 Global Step: 190910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:12,733-Speed 10915.97 samples/sec Loss 4.4460 LearningRate 0.0026 Epoch: 33 Global Step: 190920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:13,647-Speed 11210.03 samples/sec Loss 4.3805 LearningRate 0.0026 Epoch: 33 Global Step: 190930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:14,558-Speed 11245.68 samples/sec Loss 4.3017 LearningRate 0.0026 Epoch: 33 Global Step: 190940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:15,497-Speed 10919.99 samples/sec Loss 4.4875 LearningRate 0.0026 Epoch: 33 Global Step: 190950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:16,434-Speed 10938.70 samples/sec Loss 4.4510 LearningRate 0.0026 Epoch: 33 Global Step: 190960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:17,376-Speed 10872.67 samples/sec Loss 4.4552 LearningRate 0.0026 Epoch: 33 Global Step: 190970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:18,313-Speed 10944.03 samples/sec Loss 4.4565 LearningRate 0.0026 Epoch: 33 Global Step: 190980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:01:19,236-Speed 11091.62 samples/sec Loss 4.4256 LearningRate 0.0026 Epoch: 33 Global Step: 190990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:01:20,164-Speed 11050.52 samples/sec Loss 4.3774 LearningRate 0.0026 Epoch: 33 Global Step: 191000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:21,116-Speed 10767.71 samples/sec Loss 4.4867 LearningRate 0.0026 Epoch: 33 Global Step: 191010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:22,056-Speed 10899.36 samples/sec Loss 4.4415 LearningRate 0.0026 Epoch: 33 Global Step: 191020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:22,965-Speed 11270.32 samples/sec Loss 4.4276 LearningRate 0.0026 Epoch: 33 Global Step: 191030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:23,878-Speed 11223.43 samples/sec Loss 4.5302 LearningRate 0.0026 Epoch: 33 Global Step: 191040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:24,816-Speed 10925.57 samples/sec Loss 4.3221 LearningRate 0.0026 Epoch: 33 Global Step: 191050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:25,754-Speed 10922.71 samples/sec Loss 4.3729 LearningRate 0.0026 Epoch: 33 Global Step: 191060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:26,675-Speed 11127.08 samples/sec Loss 4.3971 LearningRate 0.0026 Epoch: 33 Global Step: 191070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:27,620-Speed 10838.48 samples/sec Loss 4.5428 LearningRate 0.0026 Epoch: 33 Global Step: 191080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:28,606-Speed 10391.71 samples/sec Loss 4.4657 LearningRate 0.0026 Epoch: 33 Global Step: 191090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:29,557-Speed 10773.21 samples/sec Loss 4.4695 LearningRate 0.0026 Epoch: 33 Global Step: 191100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:01:30,485-Speed 11042.99 samples/sec Loss 4.2632 LearningRate 0.0026 Epoch: 33 Global Step: 191110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:01:31,383-Speed 11410.98 samples/sec Loss 4.3588 LearningRate 0.0026 Epoch: 33 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:32,313-Speed 11022.05 samples/sec Loss 4.4608 LearningRate 0.0025 Epoch: 33 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:33,231-Speed 11164.02 samples/sec Loss 4.3006 LearningRate 0.0025 Epoch: 33 Global Step: 191140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:34,148-Speed 11175.08 samples/sec Loss 4.3002 LearningRate 0.0025 Epoch: 33 Global Step: 191150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:35,085-Speed 10931.85 samples/sec Loss 4.3788 LearningRate 0.0025 Epoch: 33 Global Step: 191160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:36,001-Speed 11188.70 samples/sec Loss 4.3343 LearningRate 0.0025 Epoch: 33 Global Step: 191170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:36,931-Speed 11027.45 samples/sec Loss 4.3703 LearningRate 0.0025 Epoch: 33 Global Step: 191180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:37,851-Speed 11134.06 samples/sec Loss 4.4147 LearningRate 0.0025 Epoch: 33 Global Step: 191190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:38,799-Speed 10803.44 samples/sec Loss 4.4618 LearningRate 0.0025 Epoch: 33 Global Step: 191200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:39,757-Speed 10701.46 samples/sec Loss 4.4238 LearningRate 0.0025 Epoch: 33 Global Step: 191210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:40,701-Speed 10855.11 samples/sec Loss 4.3975 LearningRate 0.0025 Epoch: 33 Global Step: 191220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:41,671-Speed 10557.06 samples/sec Loss 4.3260 LearningRate 0.0025 Epoch: 33 Global Step: 191230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:42,600-Speed 11040.60 samples/sec Loss 4.3890 LearningRate 0.0025 Epoch: 33 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:43,530-Speed 11009.69 samples/sec Loss 4.2747 LearningRate 0.0025 Epoch: 33 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:44,468-Speed 10921.80 samples/sec Loss 4.5053 LearningRate 0.0025 Epoch: 33 Global Step: 191260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:45,399-Speed 11010.97 samples/sec Loss 4.3239 LearningRate 0.0025 Epoch: 33 Global Step: 191270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:46,321-Speed 11115.09 samples/sec Loss 4.2639 LearningRate 0.0025 Epoch: 33 Global Step: 191280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:47,245-Speed 11084.30 samples/sec Loss 4.4903 LearningRate 0.0025 Epoch: 33 Global Step: 191290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:48,189-Speed 10857.98 samples/sec Loss 4.4609 LearningRate 0.0025 Epoch: 33 Global Step: 191300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:49,135-Speed 10834.71 samples/sec Loss 4.3679 LearningRate 0.0025 Epoch: 33 Global Step: 191310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:50,075-Speed 10898.06 samples/sec Loss 4.4751 LearningRate 0.0025 Epoch: 33 Global Step: 191320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:50,995-Speed 11141.87 samples/sec Loss 4.3489 LearningRate 0.0025 Epoch: 33 Global Step: 191330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:51,918-Speed 11102.97 samples/sec Loss 4.3828 LearningRate 0.0025 Epoch: 33 Global Step: 191340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:52,850-Speed 10996.88 samples/sec Loss 4.4071 LearningRate 0.0025 Epoch: 33 Global Step: 191350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:53,800-Speed 10778.63 samples/sec Loss 4.4263 LearningRate 0.0025 Epoch: 33 Global Step: 191360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:54,728-Speed 11048.31 samples/sec Loss 4.4701 LearningRate 0.0025 Epoch: 33 Global Step: 191370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:55,657-Speed 11038.46 samples/sec Loss 4.3778 LearningRate 0.0025 Epoch: 33 Global Step: 191380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:01:56,600-Speed 10860.17 samples/sec Loss 4.5197 LearningRate 0.0025 Epoch: 33 Global Step: 191390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:57,556-Speed 10720.39 samples/sec Loss 4.5524 LearningRate 0.0025 Epoch: 33 Global Step: 191400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:58,471-Speed 11199.73 samples/sec Loss 4.3444 LearningRate 0.0025 Epoch: 33 Global Step: 191410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:01:59,402-Speed 11009.87 samples/sec Loss 4.4464 LearningRate 0.0025 Epoch: 33 Global Step: 191420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:00,331-Speed 11025.48 samples/sec Loss 4.3107 LearningRate 0.0025 Epoch: 33 Global Step: 191430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:01,243-Speed 11242.62 samples/sec Loss 4.3917 LearningRate 0.0025 Epoch: 33 Global Step: 191440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:02,179-Speed 10948.06 samples/sec Loss 4.3855 LearningRate 0.0025 Epoch: 33 Global Step: 191450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:03,130-Speed 10774.77 samples/sec Loss 4.4200 LearningRate 0.0025 Epoch: 33 Global Step: 191460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:04,082-Speed 10762.30 samples/sec Loss 4.4337 LearningRate 0.0025 Epoch: 33 Global Step: 191470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:04,988-Speed 11308.92 samples/sec Loss 4.5316 LearningRate 0.0025 Epoch: 33 Global Step: 191480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:05,949-Speed 10661.66 samples/sec Loss 4.4068 LearningRate 0.0025 Epoch: 33 Global Step: 191490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:06,935-Speed 10390.71 samples/sec Loss 4.4148 LearningRate 0.0025 Epoch: 33 Global Step: 191500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:07,895-Speed 10670.49 samples/sec Loss 4.6232 LearningRate 0.0025 Epoch: 33 Global Step: 191510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:08,836-Speed 10896.88 samples/sec Loss 4.4979 LearningRate 0.0025 Epoch: 33 Global Step: 191520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:09,772-Speed 10938.72 samples/sec Loss 4.4741 LearningRate 0.0025 Epoch: 33 Global Step: 191530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:10,727-Speed 10731.51 samples/sec Loss 4.5449 LearningRate 0.0025 Epoch: 33 Global Step: 191540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:11,639-Speed 11238.72 samples/sec Loss 4.3390 LearningRate 0.0025 Epoch: 33 Global Step: 191550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:12,556-Speed 11176.00 samples/sec Loss 4.4645 LearningRate 0.0025 Epoch: 33 Global Step: 191560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:13,483-Speed 11053.77 samples/sec Loss 4.4837 LearningRate 0.0025 Epoch: 33 Global Step: 191570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:14,457-Speed 10522.35 samples/sec Loss 4.3765 LearningRate 0.0025 Epoch: 33 Global Step: 191580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:15,413-Speed 10722.09 samples/sec Loss 4.4302 LearningRate 0.0025 Epoch: 33 Global Step: 191590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:16,394-Speed 10443.73 samples/sec Loss 4.4193 LearningRate 0.0025 Epoch: 33 Global Step: 191600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:17,332-Speed 10923.73 samples/sec Loss 4.3628 LearningRate 0.0025 Epoch: 33 Global Step: 191610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:18,299-Speed 10596.94 samples/sec Loss 4.4209 LearningRate 0.0025 Epoch: 33 Global Step: 191620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:19,243-Speed 10858.79 samples/sec Loss 4.3438 LearningRate 0.0025 Epoch: 33 Global Step: 191630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:20,156-Speed 11226.98 samples/sec Loss 4.4586 LearningRate 0.0025 Epoch: 33 Global Step: 191640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:21,075-Speed 11148.31 samples/sec Loss 4.3978 LearningRate 0.0025 Epoch: 33 Global Step: 191650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:02:21,996-Speed 11124.06 samples/sec Loss 4.4455 LearningRate 0.0025 Epoch: 33 Global Step: 191660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:22,930-Speed 10972.45 samples/sec Loss 4.3641 LearningRate 0.0025 Epoch: 33 Global Step: 191670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:23,874-Speed 10846.42 samples/sec Loss 4.5099 LearningRate 0.0025 Epoch: 33 Global Step: 191680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:24,840-Speed 10608.62 samples/sec Loss 4.4663 LearningRate 0.0025 Epoch: 33 Global Step: 191690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:25,810-Speed 10568.50 samples/sec Loss 4.4764 LearningRate 0.0025 Epoch: 33 Global Step: 191700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:26,769-Speed 10678.98 samples/sec Loss 4.4720 LearningRate 0.0025 Epoch: 33 Global Step: 191710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:27,707-Speed 10925.08 samples/sec Loss 4.4651 LearningRate 0.0025 Epoch: 33 Global Step: 191720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:28,629-Speed 11116.97 samples/sec Loss 4.3873 LearningRate 0.0025 Epoch: 33 Global Step: 191730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:29,561-Speed 10991.21 samples/sec Loss 4.4777 LearningRate 0.0025 Epoch: 33 Global Step: 191740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:30,471-Speed 11269.10 samples/sec Loss 4.3775 LearningRate 0.0025 Epoch: 33 Global Step: 191750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:31,438-Speed 10594.30 samples/sec Loss 4.4382 LearningRate 0.0025 Epoch: 33 Global Step: 191760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:32,392-Speed 10739.09 samples/sec Loss 4.4653 LearningRate 0.0025 Epoch: 33 Global Step: 191770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:33,327-Speed 10965.21 samples/sec Loss 4.4070 LearningRate 0.0025 Epoch: 33 Global Step: 191780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:34,250-Speed 11109.48 samples/sec Loss 4.3431 LearningRate 0.0025 Epoch: 33 Global Step: 191790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:35,176-Speed 11067.72 samples/sec Loss 4.3831 LearningRate 0.0025 Epoch: 33 Global Step: 191800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:36,150-Speed 10516.18 samples/sec Loss 4.4534 LearningRate 0.0025 Epoch: 33 Global Step: 191810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:37,088-Speed 10925.60 samples/sec Loss 4.3282 LearningRate 0.0025 Epoch: 33 Global Step: 191820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:38,043-Speed 10732.08 samples/sec Loss 4.4881 LearningRate 0.0025 Epoch: 33 Global Step: 191830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:38,976-Speed 10977.19 samples/sec Loss 4.3987 LearningRate 0.0025 Epoch: 33 Global Step: 191840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:39,949-Speed 10530.86 samples/sec Loss 4.4851 LearningRate 0.0024 Epoch: 33 Global Step: 191850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:40,911-Speed 10648.80 samples/sec Loss 4.4162 LearningRate 0.0024 Epoch: 33 Global Step: 191860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:41,824-Speed 11231.59 samples/sec Loss 4.5480 LearningRate 0.0024 Epoch: 33 Global Step: 191870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:42,767-Speed 10859.09 samples/sec Loss 4.4808 LearningRate 0.0024 Epoch: 33 Global Step: 191880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:43,737-Speed 10565.36 samples/sec Loss 4.4098 LearningRate 0.0024 Epoch: 33 Global Step: 191890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:44,667-Speed 11022.35 samples/sec Loss 4.5778 LearningRate 0.0024 Epoch: 33 Global Step: 191900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:45,600-Speed 10978.81 samples/sec Loss 4.5472 LearningRate 0.0024 Epoch: 33 Global Step: 191910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:46,553-Speed 10756.37 samples/sec Loss 4.3639 LearningRate 0.0024 Epoch: 33 Global Step: 191920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:47,472-Speed 11155.31 samples/sec Loss 4.4325 LearningRate 0.0024 Epoch: 33 Global Step: 191930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:48,442-Speed 10564.20 samples/sec Loss 4.3990 LearningRate 0.0024 Epoch: 33 Global Step: 191940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:49,394-Speed 10764.66 samples/sec Loss 4.4055 LearningRate 0.0024 Epoch: 33 Global Step: 191950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:02:50,328-Speed 10966.29 samples/sec Loss 4.5407 LearningRate 0.0024 Epoch: 33 Global Step: 191960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:51,250-Speed 11112.37 samples/sec Loss 4.3766 LearningRate 0.0024 Epoch: 33 Global Step: 191970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:52,201-Speed 10775.75 samples/sec Loss 4.3829 LearningRate 0.0024 Epoch: 33 Global Step: 191980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:53,121-Speed 11145.17 samples/sec Loss 4.4094 LearningRate 0.0024 Epoch: 33 Global Step: 191990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:02:54,054-Speed 10981.77 samples/sec Loss 4.3432 LearningRate 0.0024 Epoch: 33 Global Step: 192000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:03:15,394-[lfw][192000]XNorm: 7.759878 Training: 2022-04-11 06:03:15,395-[lfw][192000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-11 06:03:15,395-[lfw][192000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:03:40,078-[cfp_fp][192000]XNorm: 6.630354 Training: 2022-04-11 06:03:40,078-[cfp_fp][192000]Accuracy-Flip: 0.95200+-0.01122 Training: 2022-04-11 06:03:40,079-[cfp_fp][192000]Accuracy-Highest: 0.95686 Training: 2022-04-11 06:04:01,234-[agedb_30][192000]XNorm: 7.590914 Training: 2022-04-11 06:04:01,234-[agedb_30][192000]Accuracy-Flip: 0.96767+-0.00800 Training: 2022-04-11 06:04:01,235-[agedb_30][192000]Accuracy-Highest: 0.97067 Training: 2022-04-11 06:04:02,194-Speed 150.28 samples/sec Loss 4.4131 LearningRate 0.0024 Epoch: 33 Global Step: 192010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:03,132-Speed 10931.27 samples/sec Loss 4.5436 LearningRate 0.0024 Epoch: 33 Global Step: 192020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:04,093-Speed 10657.21 samples/sec Loss 4.4351 LearningRate 0.0024 Epoch: 33 Global Step: 192030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:05,061-Speed 10584.96 samples/sec Loss 4.4475 LearningRate 0.0024 Epoch: 33 Global Step: 192040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:06,034-Speed 10534.16 samples/sec Loss 4.3782 LearningRate 0.0024 Epoch: 33 Global Step: 192050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:06,947-Speed 11223.99 samples/sec Loss 4.4149 LearningRate 0.0024 Epoch: 33 Global Step: 192060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:07,876-Speed 11028.92 samples/sec Loss 4.3748 LearningRate 0.0024 Epoch: 33 Global Step: 192070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:08,810-Speed 10965.45 samples/sec Loss 4.5086 LearningRate 0.0024 Epoch: 33 Global Step: 192080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:09,709-Speed 11403.36 samples/sec Loss 4.2552 LearningRate 0.0024 Epoch: 33 Global Step: 192090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:10,632-Speed 11101.89 samples/sec Loss 4.5980 LearningRate 0.0024 Epoch: 33 Global Step: 192100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:11,563-Speed 10999.30 samples/sec Loss 4.3967 LearningRate 0.0024 Epoch: 33 Global Step: 192110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:12,491-Speed 11041.40 samples/sec Loss 4.5222 LearningRate 0.0024 Epoch: 33 Global Step: 192120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:13,436-Speed 10848.42 samples/sec Loss 4.4279 LearningRate 0.0024 Epoch: 33 Global Step: 192130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:14,380-Speed 10855.08 samples/sec Loss 4.4922 LearningRate 0.0024 Epoch: 33 Global Step: 192140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:15,314-Speed 10974.57 samples/sec Loss 4.4326 LearningRate 0.0024 Epoch: 33 Global Step: 192150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:16,256-Speed 10872.63 samples/sec Loss 4.4540 LearningRate 0.0024 Epoch: 33 Global Step: 192160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:17,185-Speed 11029.50 samples/sec Loss 4.3103 LearningRate 0.0024 Epoch: 33 Global Step: 192170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:18,132-Speed 10826.08 samples/sec Loss 4.3185 LearningRate 0.0024 Epoch: 33 Global Step: 192180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:19,082-Speed 10777.38 samples/sec Loss 4.2171 LearningRate 0.0024 Epoch: 33 Global Step: 192190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:20,023-Speed 10889.87 samples/sec Loss 4.3546 LearningRate 0.0024 Epoch: 33 Global Step: 192200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:20,952-Speed 11032.63 samples/sec Loss 4.4883 LearningRate 0.0024 Epoch: 33 Global Step: 192210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:21,890-Speed 10923.25 samples/sec Loss 4.4824 LearningRate 0.0024 Epoch: 33 Global Step: 192220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:22,840-Speed 10785.90 samples/sec Loss 4.4284 LearningRate 0.0024 Epoch: 33 Global Step: 192230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:23,782-Speed 10882.75 samples/sec Loss 4.4547 LearningRate 0.0024 Epoch: 33 Global Step: 192240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:24,705-Speed 11108.31 samples/sec Loss 4.5804 LearningRate 0.0024 Epoch: 33 Global Step: 192250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:25,647-Speed 10877.07 samples/sec Loss 4.4938 LearningRate 0.0024 Epoch: 33 Global Step: 192260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:26,597-Speed 10786.67 samples/sec Loss 4.3532 LearningRate 0.0024 Epoch: 33 Global Step: 192270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 06:04:27,537-Speed 10902.07 samples/sec Loss 4.3495 LearningRate 0.0024 Epoch: 33 Global Step: 192280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:28,507-Speed 10560.13 samples/sec Loss 4.3385 LearningRate 0.0024 Epoch: 33 Global Step: 192290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:29,461-Speed 10739.60 samples/sec Loss 4.4376 LearningRate 0.0024 Epoch: 33 Global Step: 192300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:30,369-Speed 11290.43 samples/sec Loss 4.4561 LearningRate 0.0024 Epoch: 33 Global Step: 192310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:31,270-Speed 11373.10 samples/sec Loss 4.4235 LearningRate 0.0024 Epoch: 33 Global Step: 192320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:32,186-Speed 11190.50 samples/sec Loss 4.4706 LearningRate 0.0024 Epoch: 33 Global Step: 192330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:33,110-Speed 11093.93 samples/sec Loss 4.4292 LearningRate 0.0024 Epoch: 33 Global Step: 192340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:34,023-Speed 11225.59 samples/sec Loss 4.4981 LearningRate 0.0024 Epoch: 33 Global Step: 192350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:34,920-Speed 11417.02 samples/sec Loss 4.4970 LearningRate 0.0024 Epoch: 33 Global Step: 192360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:35,854-Speed 10970.22 samples/sec Loss 4.4548 LearningRate 0.0024 Epoch: 33 Global Step: 192370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:36,816-Speed 10654.35 samples/sec Loss 4.3523 LearningRate 0.0024 Epoch: 33 Global Step: 192380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:37,763-Speed 10823.06 samples/sec Loss 4.3982 LearningRate 0.0024 Epoch: 33 Global Step: 192390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:38,679-Speed 11183.79 samples/sec Loss 4.3791 LearningRate 0.0024 Epoch: 33 Global Step: 192400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:39,640-Speed 10667.23 samples/sec Loss 4.5125 LearningRate 0.0024 Epoch: 33 Global Step: 192410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:40,595-Speed 10738.71 samples/sec Loss 4.3949 LearningRate 0.0024 Epoch: 33 Global Step: 192420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:41,493-Speed 11404.98 samples/sec Loss 4.4926 LearningRate 0.0024 Epoch: 33 Global Step: 192430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:42,470-Speed 10487.53 samples/sec Loss 4.4341 LearningRate 0.0024 Epoch: 33 Global Step: 192440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:43,452-Speed 10439.38 samples/sec Loss 4.6149 LearningRate 0.0024 Epoch: 33 Global Step: 192450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:44,425-Speed 10522.09 samples/sec Loss 4.3159 LearningRate 0.0024 Epoch: 33 Global Step: 192460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:45,354-Speed 11036.22 samples/sec Loss 4.5052 LearningRate 0.0024 Epoch: 33 Global Step: 192470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:46,299-Speed 10840.22 samples/sec Loss 4.6782 LearningRate 0.0024 Epoch: 33 Global Step: 192480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:47,216-Speed 11187.13 samples/sec Loss 4.5655 LearningRate 0.0024 Epoch: 33 Global Step: 192490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:48,190-Speed 10517.36 samples/sec Loss 4.4585 LearningRate 0.0024 Epoch: 33 Global Step: 192500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:49,125-Speed 10966.09 samples/sec Loss 4.4709 LearningRate 0.0024 Epoch: 33 Global Step: 192510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:50,061-Speed 10940.38 samples/sec Loss 4.4218 LearningRate 0.0024 Epoch: 33 Global Step: 192520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:50,973-Speed 11241.79 samples/sec Loss 4.3556 LearningRate 0.0024 Epoch: 33 Global Step: 192530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:51,886-Speed 11217.07 samples/sec Loss 4.4169 LearningRate 0.0024 Epoch: 33 Global Step: 192540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:52,858-Speed 10547.67 samples/sec Loss 4.4503 LearningRate 0.0024 Epoch: 33 Global Step: 192550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:53,805-Speed 10812.45 samples/sec Loss 4.4741 LearningRate 0.0024 Epoch: 33 Global Step: 192560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:54,733-Speed 11044.43 samples/sec Loss 4.4718 LearningRate 0.0024 Epoch: 33 Global Step: 192570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:04:55,700-Speed 10606.77 samples/sec Loss 4.4386 LearningRate 0.0023 Epoch: 33 Global Step: 192580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:56,646-Speed 10828.99 samples/sec Loss 4.5711 LearningRate 0.0023 Epoch: 33 Global Step: 192590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:57,605-Speed 10686.40 samples/sec Loss 4.3657 LearningRate 0.0023 Epoch: 33 Global Step: 192600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:58,549-Speed 10848.10 samples/sec Loss 4.4514 LearningRate 0.0023 Epoch: 33 Global Step: 192610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:04:59,514-Speed 10617.31 samples/sec Loss 4.4623 LearningRate 0.0023 Epoch: 33 Global Step: 192620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:00,418-Speed 11342.42 samples/sec Loss 4.4119 LearningRate 0.0023 Epoch: 33 Global Step: 192630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:01,328-Speed 11262.80 samples/sec Loss 4.4538 LearningRate 0.0023 Epoch: 33 Global Step: 192640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:02,293-Speed 10616.27 samples/sec Loss 4.4657 LearningRate 0.0023 Epoch: 33 Global Step: 192650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:03,224-Speed 11008.02 samples/sec Loss 4.5582 LearningRate 0.0023 Epoch: 33 Global Step: 192660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:04,197-Speed 10529.80 samples/sec Loss 4.4476 LearningRate 0.0023 Epoch: 33 Global Step: 192670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:05,154-Speed 10707.36 samples/sec Loss 4.4462 LearningRate 0.0023 Epoch: 33 Global Step: 192680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:06,106-Speed 10772.80 samples/sec Loss 4.5454 LearningRate 0.0023 Epoch: 33 Global Step: 192690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:07,057-Speed 10778.58 samples/sec Loss 4.4956 LearningRate 0.0023 Epoch: 33 Global Step: 192700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:07,975-Speed 11152.18 samples/sec Loss 4.5533 LearningRate 0.0023 Epoch: 33 Global Step: 192710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:08,880-Speed 11324.33 samples/sec Loss 4.2729 LearningRate 0.0023 Epoch: 33 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:09,793-Speed 11226.43 samples/sec Loss 4.4295 LearningRate 0.0023 Epoch: 33 Global Step: 192730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:10,730-Speed 10938.28 samples/sec Loss 4.3723 LearningRate 0.0023 Epoch: 33 Global Step: 192740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:11,649-Speed 11154.89 samples/sec Loss 4.4803 LearningRate 0.0023 Epoch: 33 Global Step: 192750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:12,570-Speed 11120.10 samples/sec Loss 4.5076 LearningRate 0.0023 Epoch: 33 Global Step: 192760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:13,478-Speed 11288.02 samples/sec Loss 4.4733 LearningRate 0.0023 Epoch: 33 Global Step: 192770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:14,382-Speed 11332.96 samples/sec Loss 4.4991 LearningRate 0.0023 Epoch: 33 Global Step: 192780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:15,325-Speed 10867.67 samples/sec Loss 4.5506 LearningRate 0.0023 Epoch: 33 Global Step: 192790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:16,240-Speed 11205.92 samples/sec Loss 4.4821 LearningRate 0.0023 Epoch: 33 Global Step: 192800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:17,210-Speed 10555.40 samples/sec Loss 4.4641 LearningRate 0.0023 Epoch: 33 Global Step: 192810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:18,161-Speed 10776.56 samples/sec Loss 4.5794 LearningRate 0.0023 Epoch: 33 Global Step: 192820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:19,089-Speed 11041.35 samples/sec Loss 4.6558 LearningRate 0.0023 Epoch: 33 Global Step: 192830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:19,991-Speed 11358.71 samples/sec Loss 4.4440 LearningRate 0.0023 Epoch: 33 Global Step: 192840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:20,897-Speed 11312.37 samples/sec Loss 4.4695 LearningRate 0.0023 Epoch: 33 Global Step: 192850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:21,808-Speed 11257.92 samples/sec Loss 4.2799 LearningRate 0.0023 Epoch: 33 Global Step: 192860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:22,721-Speed 11213.46 samples/sec Loss 4.3998 LearningRate 0.0023 Epoch: 33 Global Step: 192870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:23,644-Speed 11109.86 samples/sec Loss 4.3716 LearningRate 0.0023 Epoch: 33 Global Step: 192880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:24,567-Speed 11098.36 samples/sec Loss 4.3517 LearningRate 0.0023 Epoch: 33 Global Step: 192890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:25,504-Speed 10936.02 samples/sec Loss 4.3873 LearningRate 0.0023 Epoch: 33 Global Step: 192900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:26,449-Speed 10849.32 samples/sec Loss 4.4225 LearningRate 0.0023 Epoch: 33 Global Step: 192910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:27,402-Speed 10756.99 samples/sec Loss 4.4115 LearningRate 0.0023 Epoch: 33 Global Step: 192920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:28,380-Speed 10479.44 samples/sec Loss 4.5193 LearningRate 0.0023 Epoch: 33 Global Step: 192930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:29,322-Speed 10875.37 samples/sec Loss 4.5351 LearningRate 0.0023 Epoch: 33 Global Step: 192940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:30,261-Speed 10913.29 samples/sec Loss 4.4448 LearningRate 0.0023 Epoch: 33 Global Step: 192950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:31,216-Speed 10740.42 samples/sec Loss 4.3948 LearningRate 0.0023 Epoch: 33 Global Step: 192960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:32,195-Speed 10461.26 samples/sec Loss 4.5381 LearningRate 0.0023 Epoch: 33 Global Step: 192970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:33,112-Speed 11179.31 samples/sec Loss 4.5228 LearningRate 0.0023 Epoch: 33 Global Step: 192980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:34,016-Speed 11335.72 samples/sec Loss 4.5353 LearningRate 0.0023 Epoch: 33 Global Step: 192990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:34,924-Speed 11274.94 samples/sec Loss 4.5286 LearningRate 0.0023 Epoch: 33 Global Step: 193000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:35,847-Speed 11107.51 samples/sec Loss 4.4109 LearningRate 0.0023 Epoch: 33 Global Step: 193010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:36,753-Speed 11316.44 samples/sec Loss 4.5286 LearningRate 0.0023 Epoch: 33 Global Step: 193020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:37,651-Speed 11408.73 samples/sec Loss 4.3398 LearningRate 0.0023 Epoch: 33 Global Step: 193030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:38,605-Speed 10734.88 samples/sec Loss 4.4930 LearningRate 0.0023 Epoch: 33 Global Step: 193040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:39,558-Speed 10760.25 samples/sec Loss 4.5237 LearningRate 0.0023 Epoch: 33 Global Step: 193050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:40,495-Speed 10936.42 samples/sec Loss 4.5164 LearningRate 0.0023 Epoch: 33 Global Step: 193060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:41,455-Speed 10665.67 samples/sec Loss 4.5410 LearningRate 0.0023 Epoch: 33 Global Step: 193070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:42,439-Speed 10420.13 samples/sec Loss 4.3949 LearningRate 0.0023 Epoch: 33 Global Step: 193080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:05:43,362-Speed 11103.77 samples/sec Loss 4.4515 LearningRate 0.0023 Epoch: 33 Global Step: 193090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:44,335-Speed 10530.14 samples/sec Loss 4.3016 LearningRate 0.0023 Epoch: 33 Global Step: 193100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:45,237-Speed 11355.12 samples/sec Loss 4.4107 LearningRate 0.0023 Epoch: 33 Global Step: 193110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:46,157-Speed 11151.84 samples/sec Loss 4.4457 LearningRate 0.0023 Epoch: 33 Global Step: 193120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:47,070-Speed 11216.53 samples/sec Loss 4.4531 LearningRate 0.0023 Epoch: 33 Global Step: 193130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:47,991-Speed 11127.73 samples/sec Loss 4.5038 LearningRate 0.0023 Epoch: 33 Global Step: 193140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:48,944-Speed 10747.56 samples/sec Loss 4.4253 LearningRate 0.0023 Epoch: 33 Global Step: 193150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:49,871-Speed 11058.02 samples/sec Loss 4.4782 LearningRate 0.0023 Epoch: 33 Global Step: 193160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:50,786-Speed 11206.04 samples/sec Loss 4.4400 LearningRate 0.0023 Epoch: 33 Global Step: 193170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:51,705-Speed 11143.67 samples/sec Loss 4.4882 LearningRate 0.0023 Epoch: 33 Global Step: 193180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:52,648-Speed 10868.89 samples/sec Loss 4.5277 LearningRate 0.0023 Epoch: 33 Global Step: 193190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:53,568-Speed 11141.94 samples/sec Loss 4.3868 LearningRate 0.0023 Epoch: 33 Global Step: 193200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:54,564-Speed 10288.70 samples/sec Loss 4.3253 LearningRate 0.0023 Epoch: 33 Global Step: 193210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:55,527-Speed 10642.43 samples/sec Loss 4.4269 LearningRate 0.0023 Epoch: 33 Global Step: 193220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:56,456-Speed 11028.21 samples/sec Loss 4.4429 LearningRate 0.0023 Epoch: 33 Global Step: 193230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:57,367-Speed 11240.00 samples/sec Loss 4.4855 LearningRate 0.0023 Epoch: 33 Global Step: 193240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:05:58,268-Speed 11372.09 samples/sec Loss 4.5496 LearningRate 0.0023 Epoch: 33 Global Step: 193250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:05:59,193-Speed 11079.82 samples/sec Loss 4.4492 LearningRate 0.0023 Epoch: 33 Global Step: 193260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:00,162-Speed 10570.32 samples/sec Loss 4.5730 LearningRate 0.0023 Epoch: 33 Global Step: 193270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:01,100-Speed 10929.33 samples/sec Loss 4.3477 LearningRate 0.0023 Epoch: 33 Global Step: 193280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:02,063-Speed 10650.71 samples/sec Loss 4.4943 LearningRate 0.0023 Epoch: 33 Global Step: 193290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:03,051-Speed 10363.57 samples/sec Loss 4.5138 LearningRate 0.0023 Epoch: 33 Global Step: 193300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:03,995-Speed 10860.75 samples/sec Loss 4.4579 LearningRate 0.0023 Epoch: 33 Global Step: 193310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:05,005-Speed 10147.07 samples/sec Loss 4.5324 LearningRate 0.0023 Epoch: 33 Global Step: 193320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:15,679-Speed 959.38 samples/sec Loss 4.1537 LearningRate 0.0022 Epoch: 34 Global Step: 193330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:16,663-Speed 10423.04 samples/sec Loss 4.1537 LearningRate 0.0022 Epoch: 34 Global Step: 193340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:17,958-Speed 7913.01 samples/sec Loss 4.0360 LearningRate 0.0022 Epoch: 34 Global Step: 193350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:18,942-Speed 10416.46 samples/sec Loss 4.2091 LearningRate 0.0022 Epoch: 34 Global Step: 193360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:19,924-Speed 10429.17 samples/sec Loss 4.1523 LearningRate 0.0022 Epoch: 34 Global Step: 193370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:20,996-Speed 9563.83 samples/sec Loss 4.2083 LearningRate 0.0022 Epoch: 34 Global Step: 193380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:22,011-Speed 10092.41 samples/sec Loss 4.1525 LearningRate 0.0022 Epoch: 34 Global Step: 193390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:22,982-Speed 10567.45 samples/sec Loss 4.2722 LearningRate 0.0022 Epoch: 34 Global Step: 193400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:23,924-Speed 10879.38 samples/sec Loss 3.9161 LearningRate 0.0022 Epoch: 34 Global Step: 193410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:24,903-Speed 10472.90 samples/sec Loss 4.0132 LearningRate 0.0022 Epoch: 34 Global Step: 193420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:25,897-Speed 10301.17 samples/sec Loss 4.1933 LearningRate 0.0022 Epoch: 34 Global Step: 193430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:26,859-Speed 10653.39 samples/sec Loss 4.1314 LearningRate 0.0022 Epoch: 34 Global Step: 193440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:27,841-Speed 10439.77 samples/sec Loss 4.0653 LearningRate 0.0022 Epoch: 34 Global Step: 193450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:28,789-Speed 10812.61 samples/sec Loss 4.0482 LearningRate 0.0022 Epoch: 34 Global Step: 193460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:29,778-Speed 10354.00 samples/sec Loss 4.1749 LearningRate 0.0022 Epoch: 34 Global Step: 193470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:30,770-Speed 10335.56 samples/sec Loss 4.1302 LearningRate 0.0022 Epoch: 34 Global Step: 193480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:31,776-Speed 10188.72 samples/sec Loss 4.1486 LearningRate 0.0022 Epoch: 34 Global Step: 193490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:32,717-Speed 10886.68 samples/sec Loss 4.2295 LearningRate 0.0022 Epoch: 34 Global Step: 193500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:33,637-Speed 11139.34 samples/sec Loss 4.1490 LearningRate 0.0022 Epoch: 34 Global Step: 193510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:34,569-Speed 10986.50 samples/sec Loss 4.2121 LearningRate 0.0022 Epoch: 34 Global Step: 193520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:35,509-Speed 10906.60 samples/sec Loss 4.1258 LearningRate 0.0022 Epoch: 34 Global Step: 193530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:36,458-Speed 10793.68 samples/sec Loss 4.1966 LearningRate 0.0022 Epoch: 34 Global Step: 193540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:37,429-Speed 10568.36 samples/sec Loss 4.1342 LearningRate 0.0022 Epoch: 34 Global Step: 193550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:06:38,390-Speed 10660.43 samples/sec Loss 4.1706 LearningRate 0.0022 Epoch: 34 Global Step: 193560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:39,405-Speed 10104.85 samples/sec Loss 4.1523 LearningRate 0.0022 Epoch: 34 Global Step: 193570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:40,368-Speed 10634.92 samples/sec Loss 4.1918 LearningRate 0.0022 Epoch: 34 Global Step: 193580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:41,349-Speed 10450.90 samples/sec Loss 4.1691 LearningRate 0.0022 Epoch: 34 Global Step: 193590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:42,261-Speed 11235.53 samples/sec Loss 4.2040 LearningRate 0.0022 Epoch: 34 Global Step: 193600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:43,200-Speed 10920.02 samples/sec Loss 4.2071 LearningRate 0.0022 Epoch: 34 Global Step: 193610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:44,178-Speed 10474.44 samples/sec Loss 4.1663 LearningRate 0.0022 Epoch: 34 Global Step: 193620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:45,109-Speed 11011.04 samples/sec Loss 4.1053 LearningRate 0.0022 Epoch: 34 Global Step: 193630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:46,039-Speed 11022.77 samples/sec Loss 4.1716 LearningRate 0.0022 Epoch: 34 Global Step: 193640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:46,970-Speed 11000.83 samples/sec Loss 4.0678 LearningRate 0.0022 Epoch: 34 Global Step: 193650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:47,904-Speed 10970.51 samples/sec Loss 4.1283 LearningRate 0.0022 Epoch: 34 Global Step: 193660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:48,882-Speed 10487.84 samples/sec Loss 4.2225 LearningRate 0.0022 Epoch: 34 Global Step: 193670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:49,887-Speed 10192.10 samples/sec Loss 4.1610 LearningRate 0.0022 Epoch: 34 Global Step: 193680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:50,894-Speed 10177.16 samples/sec Loss 4.2450 LearningRate 0.0022 Epoch: 34 Global Step: 193690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:51,923-Speed 9954.06 samples/sec Loss 4.2171 LearningRate 0.0022 Epoch: 34 Global Step: 193700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:52,909-Speed 10395.56 samples/sec Loss 4.1551 LearningRate 0.0022 Epoch: 34 Global Step: 193710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:53,867-Speed 10718.92 samples/sec Loss 4.2112 LearningRate 0.0022 Epoch: 34 Global Step: 193720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:54,898-Speed 9936.34 samples/sec Loss 4.2167 LearningRate 0.0022 Epoch: 34 Global Step: 193730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:06:55,849-Speed 10778.21 samples/sec Loss 4.2115 LearningRate 0.0022 Epoch: 34 Global Step: 193740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:56,793-Speed 10853.99 samples/sec Loss 4.3159 LearningRate 0.0022 Epoch: 34 Global Step: 193750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:57,786-Speed 10320.42 samples/sec Loss 4.0970 LearningRate 0.0022 Epoch: 34 Global Step: 193760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:58,779-Speed 10323.53 samples/sec Loss 4.1530 LearningRate 0.0022 Epoch: 34 Global Step: 193770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:06:59,836-Speed 9685.85 samples/sec Loss 4.1805 LearningRate 0.0022 Epoch: 34 Global Step: 193780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:00,812-Speed 10509.47 samples/sec Loss 4.1710 LearningRate 0.0022 Epoch: 34 Global Step: 193790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:01,829-Speed 10073.31 samples/sec Loss 4.1049 LearningRate 0.0022 Epoch: 34 Global Step: 193800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:02,834-Speed 10198.65 samples/sec Loss 4.2128 LearningRate 0.0022 Epoch: 34 Global Step: 193810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:03,841-Speed 10175.25 samples/sec Loss 4.1810 LearningRate 0.0022 Epoch: 34 Global Step: 193820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:04,899-Speed 9681.65 samples/sec Loss 4.2001 LearningRate 0.0022 Epoch: 34 Global Step: 193830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:05,917-Speed 10067.61 samples/sec Loss 4.2052 LearningRate 0.0022 Epoch: 34 Global Step: 193840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:06,919-Speed 10234.38 samples/sec Loss 4.1554 LearningRate 0.0022 Epoch: 34 Global Step: 193850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:07,864-Speed 10838.27 samples/sec Loss 4.2298 LearningRate 0.0022 Epoch: 34 Global Step: 193860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:08,786-Speed 11120.06 samples/sec Loss 4.2124 LearningRate 0.0022 Epoch: 34 Global Step: 193870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:09,727-Speed 10886.93 samples/sec Loss 4.2546 LearningRate 0.0022 Epoch: 34 Global Step: 193880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:10,683-Speed 10719.11 samples/sec Loss 4.3489 LearningRate 0.0022 Epoch: 34 Global Step: 193890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:11,639-Speed 10722.94 samples/sec Loss 4.0846 LearningRate 0.0022 Epoch: 34 Global Step: 193900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:12,623-Speed 10413.92 samples/sec Loss 4.1933 LearningRate 0.0022 Epoch: 34 Global Step: 193910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:13,597-Speed 10523.83 samples/sec Loss 4.2428 LearningRate 0.0022 Epoch: 34 Global Step: 193920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:14,576-Speed 10462.87 samples/sec Loss 4.2020 LearningRate 0.0022 Epoch: 34 Global Step: 193930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:15,514-Speed 10930.64 samples/sec Loss 4.1765 LearningRate 0.0022 Epoch: 34 Global Step: 193940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 06:07:16,446-Speed 10987.73 samples/sec Loss 4.1366 LearningRate 0.0022 Epoch: 34 Global Step: 193950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:17,410-Speed 10639.94 samples/sec Loss 4.2741 LearningRate 0.0022 Epoch: 34 Global Step: 193960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:07:18,347-Speed 10931.45 samples/sec Loss 4.1754 LearningRate 0.0022 Epoch: 34 Global Step: 193970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:19,303-Speed 10718.52 samples/sec Loss 4.1736 LearningRate 0.0022 Epoch: 34 Global Step: 193980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:20,245-Speed 10881.74 samples/sec Loss 4.1911 LearningRate 0.0022 Epoch: 34 Global Step: 193990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:21,154-Speed 11279.27 samples/sec Loss 4.2426 LearningRate 0.0022 Epoch: 34 Global Step: 194000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:07:42,665-[lfw][194000]XNorm: 7.836489 Training: 2022-04-11 06:07:42,665-[lfw][194000]Accuracy-Flip: 0.99683+-0.00302 Training: 2022-04-11 06:07:42,666-[lfw][194000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:08:07,317-[cfp_fp][194000]XNorm: 6.683061 Training: 2022-04-11 06:08:07,318-[cfp_fp][194000]Accuracy-Flip: 0.95343+-0.01213 Training: 2022-04-11 06:08:07,319-[cfp_fp][194000]Accuracy-Highest: 0.95686 Training: 2022-04-11 06:08:28,560-[agedb_30][194000]XNorm: 7.663900 Training: 2022-04-11 06:08:28,561-[agedb_30][194000]Accuracy-Flip: 0.97000+-0.00792 Training: 2022-04-11 06:08:28,561-[agedb_30][194000]Accuracy-Highest: 0.97067 Training: 2022-04-11 06:08:29,485-Speed 149.86 samples/sec Loss 4.0232 LearningRate 0.0022 Epoch: 34 Global Step: 194010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:30,432-Speed 10823.15 samples/sec Loss 4.1615 LearningRate 0.0022 Epoch: 34 Global Step: 194020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:31,387-Speed 10724.26 samples/sec Loss 4.1320 LearningRate 0.0022 Epoch: 34 Global Step: 194030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:32,385-Speed 10273.57 samples/sec Loss 4.1086 LearningRate 0.0022 Epoch: 34 Global Step: 194040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:33,325-Speed 10897.48 samples/sec Loss 4.2946 LearningRate 0.0022 Epoch: 34 Global Step: 194050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:34,249-Speed 11087.97 samples/sec Loss 4.1015 LearningRate 0.0022 Epoch: 34 Global Step: 194060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:35,181-Speed 10993.24 samples/sec Loss 4.1487 LearningRate 0.0022 Epoch: 34 Global Step: 194070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:08:36,114-Speed 10984.11 samples/sec Loss 4.1947 LearningRate 0.0022 Epoch: 34 Global Step: 194080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:08:37,055-Speed 10892.22 samples/sec Loss 4.1914 LearningRate 0.0022 Epoch: 34 Global Step: 194090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:08:37,997-Speed 10881.59 samples/sec Loss 4.2448 LearningRate 0.0021 Epoch: 34 Global Step: 194100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:08:38,970-Speed 10530.90 samples/sec Loss 4.2902 LearningRate 0.0021 Epoch: 34 Global Step: 194110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:08:39,892-Speed 11105.94 samples/sec Loss 4.3143 LearningRate 0.0021 Epoch: 34 Global Step: 194120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:40,894-Speed 10234.91 samples/sec Loss 4.2239 LearningRate 0.0021 Epoch: 34 Global Step: 194130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:41,906-Speed 10121.01 samples/sec Loss 4.0852 LearningRate 0.0021 Epoch: 34 Global Step: 194140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:42,858-Speed 10777.17 samples/sec Loss 4.2069 LearningRate 0.0021 Epoch: 34 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:43,792-Speed 10964.03 samples/sec Loss 4.1113 LearningRate 0.0021 Epoch: 34 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:44,732-Speed 10901.59 samples/sec Loss 4.2362 LearningRate 0.0021 Epoch: 34 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:45,662-Speed 11015.29 samples/sec Loss 4.1506 LearningRate 0.0021 Epoch: 34 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:46,585-Speed 11104.44 samples/sec Loss 4.1479 LearningRate 0.0021 Epoch: 34 Global Step: 194190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:47,520-Speed 10959.91 samples/sec Loss 4.1699 LearningRate 0.0021 Epoch: 34 Global Step: 194200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:48,489-Speed 10573.03 samples/sec Loss 4.1370 LearningRate 0.0021 Epoch: 34 Global Step: 194210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:49,466-Speed 10490.94 samples/sec Loss 4.2513 LearningRate 0.0021 Epoch: 34 Global Step: 194220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:50,455-Speed 10356.85 samples/sec Loss 4.2790 LearningRate 0.0021 Epoch: 34 Global Step: 194230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:51,439-Speed 10418.67 samples/sec Loss 4.1490 LearningRate 0.0021 Epoch: 34 Global Step: 194240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:52,450-Speed 10132.32 samples/sec Loss 4.2169 LearningRate 0.0021 Epoch: 34 Global Step: 194250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:53,409-Speed 10692.21 samples/sec Loss 4.2677 LearningRate 0.0021 Epoch: 34 Global Step: 194260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:54,390-Speed 10445.05 samples/sec Loss 4.1086 LearningRate 0.0021 Epoch: 34 Global Step: 194270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:55,357-Speed 10592.32 samples/sec Loss 4.1670 LearningRate 0.0021 Epoch: 34 Global Step: 194280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:08:56,297-Speed 10901.41 samples/sec Loss 4.2225 LearningRate 0.0021 Epoch: 34 Global Step: 194290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:57,234-Speed 10945.64 samples/sec Loss 4.1191 LearningRate 0.0021 Epoch: 34 Global Step: 194300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:58,156-Speed 11111.62 samples/sec Loss 4.1440 LearningRate 0.0021 Epoch: 34 Global Step: 194310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:08:59,080-Speed 11088.42 samples/sec Loss 4.2390 LearningRate 0.0021 Epoch: 34 Global Step: 194320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:00,060-Speed 10450.57 samples/sec Loss 4.2466 LearningRate 0.0021 Epoch: 34 Global Step: 194330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:01,014-Speed 10740.40 samples/sec Loss 4.1594 LearningRate 0.0021 Epoch: 34 Global Step: 194340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:01,976-Speed 10654.27 samples/sec Loss 4.1918 LearningRate 0.0021 Epoch: 34 Global Step: 194350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:02,951-Speed 10511.70 samples/sec Loss 4.2445 LearningRate 0.0021 Epoch: 34 Global Step: 194360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:03,922-Speed 10551.13 samples/sec Loss 4.2323 LearningRate 0.0021 Epoch: 34 Global Step: 194370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:04,888-Speed 10609.69 samples/sec Loss 4.2583 LearningRate 0.0021 Epoch: 34 Global Step: 194380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:05,854-Speed 10602.52 samples/sec Loss 4.1958 LearningRate 0.0021 Epoch: 34 Global Step: 194390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:06,797-Speed 10871.89 samples/sec Loss 4.2105 LearningRate 0.0021 Epoch: 34 Global Step: 194400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:07,735-Speed 10923.55 samples/sec Loss 4.2279 LearningRate 0.0021 Epoch: 34 Global Step: 194410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:08,651-Speed 11188.35 samples/sec Loss 4.3460 LearningRate 0.0021 Epoch: 34 Global Step: 194420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:09,577-Speed 11071.28 samples/sec Loss 4.2777 LearningRate 0.0021 Epoch: 34 Global Step: 194430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:10,542-Speed 10620.49 samples/sec Loss 4.2505 LearningRate 0.0021 Epoch: 34 Global Step: 194440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:11,527-Speed 10397.62 samples/sec Loss 4.2094 LearningRate 0.0021 Epoch: 34 Global Step: 194450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:12,491-Speed 10632.77 samples/sec Loss 4.1775 LearningRate 0.0021 Epoch: 34 Global Step: 194460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:13,440-Speed 10797.60 samples/sec Loss 4.1906 LearningRate 0.0021 Epoch: 34 Global Step: 194470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:14,449-Speed 10158.27 samples/sec Loss 4.1294 LearningRate 0.0021 Epoch: 34 Global Step: 194480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:15,374-Speed 11073.14 samples/sec Loss 4.1660 LearningRate 0.0021 Epoch: 34 Global Step: 194490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:16,323-Speed 10798.81 samples/sec Loss 4.2551 LearningRate 0.0021 Epoch: 34 Global Step: 194500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:17,264-Speed 10882.77 samples/sec Loss 4.2103 LearningRate 0.0021 Epoch: 34 Global Step: 194510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:18,243-Speed 10471.50 samples/sec Loss 4.1964 LearningRate 0.0021 Epoch: 34 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:19,189-Speed 10828.79 samples/sec Loss 4.1615 LearningRate 0.0021 Epoch: 34 Global Step: 194530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:20,186-Speed 10278.82 samples/sec Loss 4.1862 LearningRate 0.0021 Epoch: 34 Global Step: 194540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:21,103-Speed 11170.34 samples/sec Loss 4.1677 LearningRate 0.0021 Epoch: 34 Global Step: 194550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:22,043-Speed 10907.93 samples/sec Loss 4.3353 LearningRate 0.0021 Epoch: 34 Global Step: 194560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:23,001-Speed 10691.82 samples/sec Loss 4.3202 LearningRate 0.0021 Epoch: 34 Global Step: 194570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:23,949-Speed 10810.80 samples/sec Loss 4.2471 LearningRate 0.0021 Epoch: 34 Global Step: 194580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:09:24,913-Speed 10632.18 samples/sec Loss 4.2539 LearningRate 0.0021 Epoch: 34 Global Step: 194590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:25,871-Speed 10697.12 samples/sec Loss 4.2367 LearningRate 0.0021 Epoch: 34 Global Step: 194600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:26,819-Speed 10803.45 samples/sec Loss 4.2243 LearningRate 0.0021 Epoch: 34 Global Step: 194610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:27,803-Speed 10416.57 samples/sec Loss 4.2071 LearningRate 0.0021 Epoch: 34 Global Step: 194620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:28,775-Speed 10543.82 samples/sec Loss 4.1054 LearningRate 0.0021 Epoch: 34 Global Step: 194630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:29,712-Speed 10933.65 samples/sec Loss 4.2117 LearningRate 0.0021 Epoch: 34 Global Step: 194640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:30,676-Speed 10635.22 samples/sec Loss 4.2544 LearningRate 0.0021 Epoch: 34 Global Step: 194650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:31,640-Speed 10624.34 samples/sec Loss 4.2070 LearningRate 0.0021 Epoch: 34 Global Step: 194660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:32,585-Speed 10843.31 samples/sec Loss 4.1004 LearningRate 0.0021 Epoch: 34 Global Step: 194670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:33,498-Speed 11222.80 samples/sec Loss 4.1856 LearningRate 0.0021 Epoch: 34 Global Step: 194680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:34,411-Speed 11223.95 samples/sec Loss 4.1786 LearningRate 0.0021 Epoch: 34 Global Step: 194690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:35,330-Speed 11157.20 samples/sec Loss 4.1591 LearningRate 0.0021 Epoch: 34 Global Step: 194700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:36,305-Speed 10501.46 samples/sec Loss 4.2254 LearningRate 0.0021 Epoch: 34 Global Step: 194710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:37,244-Speed 10918.14 samples/sec Loss 4.2855 LearningRate 0.0021 Epoch: 34 Global Step: 194720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:38,243-Speed 10262.69 samples/sec Loss 4.1250 LearningRate 0.0021 Epoch: 34 Global Step: 194730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:39,231-Speed 10370.81 samples/sec Loss 4.2907 LearningRate 0.0021 Epoch: 34 Global Step: 194740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:40,217-Speed 10388.04 samples/sec Loss 4.2382 LearningRate 0.0021 Epoch: 34 Global Step: 194750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:41,180-Speed 10633.00 samples/sec Loss 4.3270 LearningRate 0.0021 Epoch: 34 Global Step: 194760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:42,151-Speed 10560.88 samples/sec Loss 4.2662 LearningRate 0.0021 Epoch: 34 Global Step: 194770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:43,113-Speed 10644.99 samples/sec Loss 4.2308 LearningRate 0.0021 Epoch: 34 Global Step: 194780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:09:44,126-Speed 10121.70 samples/sec Loss 4.1893 LearningRate 0.0021 Epoch: 34 Global Step: 194790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:45,058-Speed 10987.29 samples/sec Loss 4.2016 LearningRate 0.0021 Epoch: 34 Global Step: 194800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:45,979-Speed 11131.11 samples/sec Loss 4.1610 LearningRate 0.0021 Epoch: 34 Global Step: 194810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:46,901-Speed 11105.62 samples/sec Loss 4.3051 LearningRate 0.0021 Epoch: 34 Global Step: 194820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:47,855-Speed 10752.37 samples/sec Loss 4.2974 LearningRate 0.0021 Epoch: 34 Global Step: 194830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:48,781-Speed 11060.84 samples/sec Loss 4.2057 LearningRate 0.0021 Epoch: 34 Global Step: 194840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:49,713-Speed 11002.88 samples/sec Loss 4.2282 LearningRate 0.0021 Epoch: 34 Global Step: 194850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:50,656-Speed 10863.24 samples/sec Loss 4.1366 LearningRate 0.0021 Epoch: 34 Global Step: 194860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:51,606-Speed 10786.44 samples/sec Loss 4.3062 LearningRate 0.0021 Epoch: 34 Global Step: 194870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:52,583-Speed 10484.48 samples/sec Loss 4.1709 LearningRate 0.0020 Epoch: 34 Global Step: 194880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:53,546-Speed 10647.61 samples/sec Loss 4.1620 LearningRate 0.0020 Epoch: 34 Global Step: 194890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:54,521-Speed 10511.51 samples/sec Loss 4.2474 LearningRate 0.0020 Epoch: 34 Global Step: 194900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:55,445-Speed 11083.28 samples/sec Loss 4.1580 LearningRate 0.0020 Epoch: 34 Global Step: 194910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:56,372-Speed 11063.20 samples/sec Loss 4.1871 LearningRate 0.0020 Epoch: 34 Global Step: 194920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:57,299-Speed 11048.40 samples/sec Loss 4.2578 LearningRate 0.0020 Epoch: 34 Global Step: 194930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:58,226-Speed 11056.90 samples/sec Loss 4.2044 LearningRate 0.0020 Epoch: 34 Global Step: 194940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:09:59,165-Speed 10918.47 samples/sec Loss 4.2241 LearningRate 0.0020 Epoch: 34 Global Step: 194950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:00,104-Speed 10912.04 samples/sec Loss 4.1800 LearningRate 0.0020 Epoch: 34 Global Step: 194960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:01,100-Speed 10285.28 samples/sec Loss 4.0947 LearningRate 0.0020 Epoch: 34 Global Step: 194970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:02,054-Speed 10737.50 samples/sec Loss 4.2280 LearningRate 0.0020 Epoch: 34 Global Step: 194980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:03,068-Speed 10111.44 samples/sec Loss 4.1994 LearningRate 0.0020 Epoch: 34 Global Step: 194990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:04,003-Speed 10961.85 samples/sec Loss 4.4335 LearningRate 0.0020 Epoch: 34 Global Step: 195000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:04,983-Speed 10447.81 samples/sec Loss 4.2698 LearningRate 0.0020 Epoch: 34 Global Step: 195010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:05,954-Speed 10551.98 samples/sec Loss 4.1935 LearningRate 0.0020 Epoch: 34 Global Step: 195020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:06,916-Speed 10654.36 samples/sec Loss 4.2913 LearningRate 0.0020 Epoch: 34 Global Step: 195030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:07,864-Speed 10810.90 samples/sec Loss 4.1990 LearningRate 0.0020 Epoch: 34 Global Step: 195040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:08,788-Speed 11091.72 samples/sec Loss 4.2966 LearningRate 0.0020 Epoch: 34 Global Step: 195050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:09,721-Speed 10977.63 samples/sec Loss 4.2077 LearningRate 0.0020 Epoch: 34 Global Step: 195060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:10,676-Speed 10737.97 samples/sec Loss 4.3310 LearningRate 0.0020 Epoch: 34 Global Step: 195070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:11,625-Speed 10797.32 samples/sec Loss 4.2421 LearningRate 0.0020 Epoch: 34 Global Step: 195080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:12,574-Speed 10790.72 samples/sec Loss 4.2318 LearningRate 0.0020 Epoch: 34 Global Step: 195090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:13,576-Speed 10236.52 samples/sec Loss 4.2282 LearningRate 0.0020 Epoch: 34 Global Step: 195100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:14,553-Speed 10481.85 samples/sec Loss 4.4195 LearningRate 0.0020 Epoch: 34 Global Step: 195110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:15,490-Speed 10936.27 samples/sec Loss 4.1128 LearningRate 0.0020 Epoch: 34 Global Step: 195120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:16,496-Speed 10193.12 samples/sec Loss 4.2040 LearningRate 0.0020 Epoch: 34 Global Step: 195130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:17,463-Speed 10590.32 samples/sec Loss 4.1589 LearningRate 0.0020 Epoch: 34 Global Step: 195140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:18,433-Speed 10568.98 samples/sec Loss 4.3963 LearningRate 0.0020 Epoch: 34 Global Step: 195150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:19,422-Speed 10358.06 samples/sec Loss 4.2765 LearningRate 0.0020 Epoch: 34 Global Step: 195160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:20,380-Speed 10700.97 samples/sec Loss 4.0958 LearningRate 0.0020 Epoch: 34 Global Step: 195170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:21,327-Speed 10824.37 samples/sec Loss 4.1926 LearningRate 0.0020 Epoch: 34 Global Step: 195180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:22,256-Speed 11028.20 samples/sec Loss 4.2007 LearningRate 0.0020 Epoch: 34 Global Step: 195190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:23,228-Speed 10545.78 samples/sec Loss 4.3747 LearningRate 0.0020 Epoch: 34 Global Step: 195200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:24,189-Speed 10657.17 samples/sec Loss 4.1719 LearningRate 0.0020 Epoch: 34 Global Step: 195210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:25,160-Speed 10551.48 samples/sec Loss 4.2351 LearningRate 0.0020 Epoch: 34 Global Step: 195220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:26,104-Speed 10859.23 samples/sec Loss 4.0615 LearningRate 0.0020 Epoch: 34 Global Step: 195230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:27,070-Speed 10607.34 samples/sec Loss 4.3251 LearningRate 0.0020 Epoch: 34 Global Step: 195240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:28,016-Speed 10836.97 samples/sec Loss 4.2943 LearningRate 0.0020 Epoch: 34 Global Step: 195250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:29,006-Speed 10347.74 samples/sec Loss 4.2003 LearningRate 0.0020 Epoch: 34 Global Step: 195260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:30,004-Speed 10261.55 samples/sec Loss 4.1540 LearningRate 0.0020 Epoch: 34 Global Step: 195270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:30,953-Speed 10806.35 samples/sec Loss 4.1592 LearningRate 0.0020 Epoch: 34 Global Step: 195280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:31,936-Speed 10424.59 samples/sec Loss 4.2373 LearningRate 0.0020 Epoch: 34 Global Step: 195290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:32,893-Speed 10700.38 samples/sec Loss 4.2049 LearningRate 0.0020 Epoch: 34 Global Step: 195300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:33,811-Speed 11164.73 samples/sec Loss 4.2508 LearningRate 0.0020 Epoch: 34 Global Step: 195310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:34,762-Speed 10781.64 samples/sec Loss 4.2689 LearningRate 0.0020 Epoch: 34 Global Step: 195320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:35,703-Speed 10880.78 samples/sec Loss 4.3043 LearningRate 0.0020 Epoch: 34 Global Step: 195330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:36,643-Speed 10904.99 samples/sec Loss 4.1714 LearningRate 0.0020 Epoch: 34 Global Step: 195340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:37,594-Speed 10772.43 samples/sec Loss 4.3023 LearningRate 0.0020 Epoch: 34 Global Step: 195350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:38,560-Speed 10609.37 samples/sec Loss 4.1985 LearningRate 0.0020 Epoch: 34 Global Step: 195360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:39,533-Speed 10530.88 samples/sec Loss 4.2306 LearningRate 0.0020 Epoch: 34 Global Step: 195370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:40,510-Speed 10494.30 samples/sec Loss 4.3064 LearningRate 0.0020 Epoch: 34 Global Step: 195380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:41,440-Speed 11012.62 samples/sec Loss 4.1021 LearningRate 0.0020 Epoch: 34 Global Step: 195390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:42,394-Speed 10743.02 samples/sec Loss 4.2234 LearningRate 0.0020 Epoch: 34 Global Step: 195400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:43,312-Speed 11167.48 samples/sec Loss 4.3352 LearningRate 0.0020 Epoch: 34 Global Step: 195410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:44,245-Speed 10973.64 samples/sec Loss 4.1685 LearningRate 0.0020 Epoch: 34 Global Step: 195420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:45,191-Speed 10835.43 samples/sec Loss 4.2619 LearningRate 0.0020 Epoch: 34 Global Step: 195430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:46,135-Speed 10849.58 samples/sec Loss 4.2703 LearningRate 0.0020 Epoch: 34 Global Step: 195440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:47,071-Speed 10952.32 samples/sec Loss 4.2304 LearningRate 0.0020 Epoch: 34 Global Step: 195450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:48,027-Speed 10725.71 samples/sec Loss 4.2405 LearningRate 0.0020 Epoch: 34 Global Step: 195460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:49,004-Speed 10486.54 samples/sec Loss 4.3112 LearningRate 0.0020 Epoch: 34 Global Step: 195470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:49,949-Speed 10840.45 samples/sec Loss 4.3544 LearningRate 0.0020 Epoch: 34 Global Step: 195480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:50,916-Speed 10603.75 samples/sec Loss 4.1824 LearningRate 0.0020 Epoch: 34 Global Step: 195490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:51,867-Speed 10773.62 samples/sec Loss 4.2377 LearningRate 0.0020 Epoch: 34 Global Step: 195500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:52,864-Speed 10282.04 samples/sec Loss 4.2511 LearningRate 0.0020 Epoch: 34 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:53,777-Speed 11220.72 samples/sec Loss 4.2090 LearningRate 0.0020 Epoch: 34 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:54,711-Speed 10970.44 samples/sec Loss 4.2629 LearningRate 0.0020 Epoch: 34 Global Step: 195530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:55,636-Speed 11084.87 samples/sec Loss 4.3437 LearningRate 0.0020 Epoch: 34 Global Step: 195540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:10:56,568-Speed 10993.60 samples/sec Loss 4.3439 LearningRate 0.0020 Epoch: 34 Global Step: 195550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:57,533-Speed 10616.81 samples/sec Loss 4.2275 LearningRate 0.0020 Epoch: 34 Global Step: 195560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:58,496-Speed 10641.14 samples/sec Loss 4.2114 LearningRate 0.0020 Epoch: 34 Global Step: 195570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:10:59,435-Speed 10914.15 samples/sec Loss 4.1947 LearningRate 0.0020 Epoch: 34 Global Step: 195580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:00,433-Speed 10266.16 samples/sec Loss 4.3126 LearningRate 0.0020 Epoch: 34 Global Step: 195590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:01,413-Speed 10457.85 samples/sec Loss 4.2863 LearningRate 0.0020 Epoch: 34 Global Step: 195600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:02,403-Speed 10344.36 samples/sec Loss 4.3097 LearningRate 0.0020 Epoch: 34 Global Step: 195610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:03,349-Speed 10836.68 samples/sec Loss 4.2389 LearningRate 0.0020 Epoch: 34 Global Step: 195620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:04,306-Speed 10708.87 samples/sec Loss 4.1889 LearningRate 0.0020 Epoch: 34 Global Step: 195630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:05,256-Speed 10779.91 samples/sec Loss 4.3926 LearningRate 0.0020 Epoch: 34 Global Step: 195640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:06,206-Speed 10792.71 samples/sec Loss 4.2591 LearningRate 0.0020 Epoch: 34 Global Step: 195650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:07,186-Speed 10452.72 samples/sec Loss 4.2517 LearningRate 0.0020 Epoch: 34 Global Step: 195660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:08,172-Speed 10395.36 samples/sec Loss 4.3127 LearningRate 0.0020 Epoch: 34 Global Step: 195670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:09,095-Speed 11102.74 samples/sec Loss 4.1585 LearningRate 0.0019 Epoch: 34 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:10,032-Speed 10928.48 samples/sec Loss 4.2259 LearningRate 0.0019 Epoch: 34 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:11,004-Speed 10541.52 samples/sec Loss 4.2166 LearningRate 0.0019 Epoch: 34 Global Step: 195700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:11,951-Speed 10816.61 samples/sec Loss 4.3349 LearningRate 0.0019 Epoch: 34 Global Step: 195710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:12,904-Speed 10759.39 samples/sec Loss 4.2778 LearningRate 0.0019 Epoch: 34 Global Step: 195720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:13,855-Speed 10772.22 samples/sec Loss 4.3260 LearningRate 0.0019 Epoch: 34 Global Step: 195730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:14,824-Speed 10578.92 samples/sec Loss 4.2917 LearningRate 0.0019 Epoch: 34 Global Step: 195740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:15,818-Speed 10300.23 samples/sec Loss 4.2622 LearningRate 0.0019 Epoch: 34 Global Step: 195750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:16,774-Speed 10724.24 samples/sec Loss 4.3258 LearningRate 0.0019 Epoch: 34 Global Step: 195760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:17,729-Speed 10726.74 samples/sec Loss 4.1868 LearningRate 0.0019 Epoch: 34 Global Step: 195770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:18,682-Speed 10766.88 samples/sec Loss 4.1872 LearningRate 0.0019 Epoch: 34 Global Step: 195780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:19,672-Speed 10348.50 samples/sec Loss 4.2406 LearningRate 0.0019 Epoch: 34 Global Step: 195790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:20,663-Speed 10334.55 samples/sec Loss 4.2323 LearningRate 0.0019 Epoch: 34 Global Step: 195800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:21,583-Speed 11137.72 samples/sec Loss 4.3654 LearningRate 0.0019 Epoch: 34 Global Step: 195810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:22,521-Speed 10931.51 samples/sec Loss 4.2446 LearningRate 0.0019 Epoch: 34 Global Step: 195820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:23,440-Speed 11150.62 samples/sec Loss 4.3718 LearningRate 0.0019 Epoch: 34 Global Step: 195830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:24,418-Speed 10473.63 samples/sec Loss 4.3059 LearningRate 0.0019 Epoch: 34 Global Step: 195840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:25,340-Speed 11113.90 samples/sec Loss 4.2545 LearningRate 0.0019 Epoch: 34 Global Step: 195850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:26,310-Speed 10566.44 samples/sec Loss 4.2906 LearningRate 0.0019 Epoch: 34 Global Step: 195860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:27,258-Speed 10805.72 samples/sec Loss 4.2925 LearningRate 0.0019 Epoch: 34 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:28,202-Speed 10853.87 samples/sec Loss 4.2029 LearningRate 0.0019 Epoch: 34 Global Step: 195880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:29,178-Speed 10503.18 samples/sec Loss 4.3172 LearningRate 0.0019 Epoch: 34 Global Step: 195890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:30,170-Speed 10327.06 samples/sec Loss 4.1983 LearningRate 0.0019 Epoch: 34 Global Step: 195900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:31,100-Speed 11021.49 samples/sec Loss 4.3640 LearningRate 0.0019 Epoch: 34 Global Step: 195910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:32,069-Speed 10566.63 samples/sec Loss 4.2881 LearningRate 0.0019 Epoch: 34 Global Step: 195920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:33,002-Speed 10982.78 samples/sec Loss 4.2851 LearningRate 0.0019 Epoch: 34 Global Step: 195930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:33,918-Speed 11194.01 samples/sec Loss 4.2457 LearningRate 0.0019 Epoch: 34 Global Step: 195940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:34,869-Speed 10776.17 samples/sec Loss 4.2590 LearningRate 0.0019 Epoch: 34 Global Step: 195950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:35,820-Speed 10774.88 samples/sec Loss 4.2924 LearningRate 0.0019 Epoch: 34 Global Step: 195960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:11:36,787-Speed 10591.99 samples/sec Loss 4.3989 LearningRate 0.0019 Epoch: 34 Global Step: 195970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:37,777-Speed 10357.90 samples/sec Loss 4.2885 LearningRate 0.0019 Epoch: 34 Global Step: 195980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:38,745-Speed 10589.89 samples/sec Loss 4.2300 LearningRate 0.0019 Epoch: 34 Global Step: 195990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:11:39,660-Speed 11193.51 samples/sec Loss 4.2162 LearningRate 0.0019 Epoch: 34 Global Step: 196000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:01,123-[lfw][196000]XNorm: 7.714078 Training: 2022-04-11 06:12:01,124-[lfw][196000]Accuracy-Flip: 0.99683+-0.00302 Training: 2022-04-11 06:12:01,124-[lfw][196000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:12:25,766-[cfp_fp][196000]XNorm: 6.592993 Training: 2022-04-11 06:12:25,767-[cfp_fp][196000]Accuracy-Flip: 0.95943+-0.00994 Training: 2022-04-11 06:12:25,767-[cfp_fp][196000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:12:47,098-[agedb_30][196000]XNorm: 7.557557 Training: 2022-04-11 06:12:47,098-[agedb_30][196000]Accuracy-Flip: 0.96917+-0.00880 Training: 2022-04-11 06:12:47,098-[agedb_30][196000]Accuracy-Highest: 0.97067 Training: 2022-04-11 06:12:48,060-Speed 149.71 samples/sec Loss 4.2288 LearningRate 0.0019 Epoch: 34 Global Step: 196010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:49,002-Speed 10877.19 samples/sec Loss 4.1841 LearningRate 0.0019 Epoch: 34 Global Step: 196020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:49,965-Speed 10640.35 samples/sec Loss 4.3646 LearningRate 0.0019 Epoch: 34 Global Step: 196030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:50,928-Speed 10649.53 samples/sec Loss 4.1736 LearningRate 0.0019 Epoch: 34 Global Step: 196040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:51,870-Speed 10870.17 samples/sec Loss 4.2230 LearningRate 0.0019 Epoch: 34 Global Step: 196050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:52,823-Speed 10758.17 samples/sec Loss 4.3310 LearningRate 0.0019 Epoch: 34 Global Step: 196060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:53,772-Speed 10796.26 samples/sec Loss 4.2595 LearningRate 0.0019 Epoch: 34 Global Step: 196070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:54,757-Speed 10405.04 samples/sec Loss 4.2684 LearningRate 0.0019 Epoch: 34 Global Step: 196080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:55,709-Speed 10758.24 samples/sec Loss 4.1194 LearningRate 0.0019 Epoch: 34 Global Step: 196090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:12:56,642-Speed 10984.10 samples/sec Loss 4.2370 LearningRate 0.0019 Epoch: 34 Global Step: 196100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:12:57,590-Speed 10814.10 samples/sec Loss 4.2937 LearningRate 0.0019 Epoch: 34 Global Step: 196110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:12:58,581-Speed 10334.43 samples/sec Loss 4.3114 LearningRate 0.0019 Epoch: 34 Global Step: 196120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:12:59,507-Speed 11074.76 samples/sec Loss 4.3322 LearningRate 0.0019 Epoch: 34 Global Step: 196130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:00,429-Speed 11108.02 samples/sec Loss 4.2341 LearningRate 0.0019 Epoch: 34 Global Step: 196140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:01,402-Speed 10533.03 samples/sec Loss 4.2142 LearningRate 0.0019 Epoch: 34 Global Step: 196150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:02,325-Speed 11100.56 samples/sec Loss 4.3153 LearningRate 0.0019 Epoch: 34 Global Step: 196160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:03,273-Speed 10811.04 samples/sec Loss 4.3647 LearningRate 0.0019 Epoch: 34 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:04,229-Speed 10722.70 samples/sec Loss 4.3521 LearningRate 0.0019 Epoch: 34 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:05,217-Speed 10368.28 samples/sec Loss 4.2771 LearningRate 0.0019 Epoch: 34 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:06,167-Speed 10788.44 samples/sec Loss 4.2949 LearningRate 0.0019 Epoch: 34 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:07,126-Speed 10686.56 samples/sec Loss 4.3864 LearningRate 0.0019 Epoch: 34 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:08,066-Speed 10902.91 samples/sec Loss 4.4662 LearningRate 0.0019 Epoch: 34 Global Step: 196220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:08,988-Speed 11113.01 samples/sec Loss 4.3771 LearningRate 0.0019 Epoch: 34 Global Step: 196230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:09,912-Speed 11089.62 samples/sec Loss 4.3043 LearningRate 0.0019 Epoch: 34 Global Step: 196240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:10,834-Speed 11114.58 samples/sec Loss 4.3337 LearningRate 0.0019 Epoch: 34 Global Step: 196250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:11,788-Speed 10738.97 samples/sec Loss 4.0926 LearningRate 0.0019 Epoch: 34 Global Step: 196260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:12,757-Speed 10570.02 samples/sec Loss 4.2513 LearningRate 0.0019 Epoch: 34 Global Step: 196270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:13,749-Speed 10327.80 samples/sec Loss 4.3261 LearningRate 0.0019 Epoch: 34 Global Step: 196280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:14,703-Speed 10738.61 samples/sec Loss 4.3816 LearningRate 0.0019 Epoch: 34 Global Step: 196290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:15,708-Speed 10200.23 samples/sec Loss 4.2699 LearningRate 0.0019 Epoch: 34 Global Step: 196300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:16,671-Speed 10643.42 samples/sec Loss 4.1666 LearningRate 0.0019 Epoch: 34 Global Step: 196310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:17,614-Speed 10863.43 samples/sec Loss 4.2342 LearningRate 0.0019 Epoch: 34 Global Step: 196320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:18,530-Speed 11190.02 samples/sec Loss 4.2753 LearningRate 0.0019 Epoch: 34 Global Step: 196330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:19,512-Speed 10426.89 samples/sec Loss 4.3711 LearningRate 0.0019 Epoch: 34 Global Step: 196340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:20,467-Speed 10738.37 samples/sec Loss 4.2755 LearningRate 0.0019 Epoch: 34 Global Step: 196350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:21,373-Speed 11310.97 samples/sec Loss 4.2390 LearningRate 0.0019 Epoch: 34 Global Step: 196360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:22,302-Speed 11022.25 samples/sec Loss 4.3282 LearningRate 0.0019 Epoch: 34 Global Step: 196370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:23,230-Speed 11044.12 samples/sec Loss 4.2939 LearningRate 0.0019 Epoch: 34 Global Step: 196380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:24,171-Speed 10892.07 samples/sec Loss 4.2762 LearningRate 0.0019 Epoch: 34 Global Step: 196390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:25,088-Speed 11173.85 samples/sec Loss 4.4166 LearningRate 0.0019 Epoch: 34 Global Step: 196400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:26,054-Speed 10614.40 samples/sec Loss 4.4052 LearningRate 0.0019 Epoch: 34 Global Step: 196410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:27,024-Speed 10568.06 samples/sec Loss 4.1515 LearningRate 0.0019 Epoch: 34 Global Step: 196420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:27,982-Speed 10693.54 samples/sec Loss 4.2977 LearningRate 0.0019 Epoch: 34 Global Step: 196430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:28,950-Speed 10581.83 samples/sec Loss 4.3648 LearningRate 0.0019 Epoch: 34 Global Step: 196440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:29,887-Speed 10932.65 samples/sec Loss 4.4324 LearningRate 0.0019 Epoch: 34 Global Step: 196450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:30,870-Speed 10422.23 samples/sec Loss 4.3208 LearningRate 0.0019 Epoch: 34 Global Step: 196460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:31,842-Speed 10549.51 samples/sec Loss 4.3113 LearningRate 0.0019 Epoch: 34 Global Step: 196470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:32,791-Speed 10801.77 samples/sec Loss 4.2177 LearningRate 0.0019 Epoch: 34 Global Step: 196480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:33,708-Speed 11187.19 samples/sec Loss 4.2008 LearningRate 0.0019 Epoch: 34 Global Step: 196490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:34,632-Speed 11090.30 samples/sec Loss 4.2910 LearningRate 0.0019 Epoch: 34 Global Step: 196500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:35,584-Speed 10759.01 samples/sec Loss 4.3825 LearningRate 0.0018 Epoch: 34 Global Step: 196510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:36,547-Speed 10645.40 samples/sec Loss 4.2690 LearningRate 0.0018 Epoch: 34 Global Step: 196520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:37,511-Speed 10626.51 samples/sec Loss 4.2080 LearningRate 0.0018 Epoch: 34 Global Step: 196530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:38,464-Speed 10752.58 samples/sec Loss 4.3386 LearningRate 0.0018 Epoch: 34 Global Step: 196540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:39,411-Speed 10811.61 samples/sec Loss 4.3400 LearningRate 0.0018 Epoch: 34 Global Step: 196550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:40,382-Speed 10560.32 samples/sec Loss 4.3637 LearningRate 0.0018 Epoch: 34 Global Step: 196560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:41,349-Speed 10592.60 samples/sec Loss 4.2509 LearningRate 0.0018 Epoch: 34 Global Step: 196570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:42,308-Speed 10688.44 samples/sec Loss 4.2969 LearningRate 0.0018 Epoch: 34 Global Step: 196580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:43,278-Speed 10565.66 samples/sec Loss 4.2815 LearningRate 0.0018 Epoch: 34 Global Step: 196590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:44,219-Speed 10890.69 samples/sec Loss 4.3499 LearningRate 0.0018 Epoch: 34 Global Step: 196600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:45,127-Speed 11280.94 samples/sec Loss 4.2604 LearningRate 0.0018 Epoch: 34 Global Step: 196610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:46,048-Speed 11124.87 samples/sec Loss 4.3124 LearningRate 0.0018 Epoch: 34 Global Step: 196620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:46,969-Speed 11135.35 samples/sec Loss 4.3026 LearningRate 0.0018 Epoch: 34 Global Step: 196630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:47,924-Speed 10724.88 samples/sec Loss 4.3532 LearningRate 0.0018 Epoch: 34 Global Step: 196640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:48,899-Speed 10512.19 samples/sec Loss 4.2932 LearningRate 0.0018 Epoch: 34 Global Step: 196650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:49,868-Speed 10567.73 samples/sec Loss 4.2079 LearningRate 0.0018 Epoch: 34 Global Step: 196660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:50,831-Speed 10641.35 samples/sec Loss 4.2695 LearningRate 0.0018 Epoch: 34 Global Step: 196670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:51,783-Speed 10761.77 samples/sec Loss 4.3285 LearningRate 0.0018 Epoch: 34 Global Step: 196680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:52,786-Speed 10223.31 samples/sec Loss 4.4364 LearningRate 0.0018 Epoch: 34 Global Step: 196690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:53,750-Speed 10626.46 samples/sec Loss 4.1840 LearningRate 0.0018 Epoch: 34 Global Step: 196700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:54,704-Speed 10739.54 samples/sec Loss 4.2135 LearningRate 0.0018 Epoch: 34 Global Step: 196710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:13:55,690-Speed 10397.07 samples/sec Loss 4.2238 LearningRate 0.0018 Epoch: 34 Global Step: 196720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:56,634-Speed 10846.27 samples/sec Loss 4.2761 LearningRate 0.0018 Epoch: 34 Global Step: 196730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:57,554-Speed 11149.33 samples/sec Loss 4.1485 LearningRate 0.0018 Epoch: 34 Global Step: 196740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:58,479-Speed 11068.93 samples/sec Loss 4.2730 LearningRate 0.0018 Epoch: 34 Global Step: 196750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:13:59,431-Speed 10768.48 samples/sec Loss 4.3682 LearningRate 0.0018 Epoch: 34 Global Step: 196760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:00,416-Speed 10400.40 samples/sec Loss 4.1953 LearningRate 0.0018 Epoch: 34 Global Step: 196770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:01,376-Speed 10673.95 samples/sec Loss 4.1255 LearningRate 0.0018 Epoch: 34 Global Step: 196780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:02,359-Speed 10424.60 samples/sec Loss 4.1834 LearningRate 0.0018 Epoch: 34 Global Step: 196790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:03,323-Speed 10631.14 samples/sec Loss 4.2886 LearningRate 0.0018 Epoch: 34 Global Step: 196800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:04,269-Speed 10829.00 samples/sec Loss 4.2780 LearningRate 0.0018 Epoch: 34 Global Step: 196810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:05,190-Speed 11121.10 samples/sec Loss 4.3195 LearningRate 0.0018 Epoch: 34 Global Step: 196820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:06,179-Speed 10364.93 samples/sec Loss 4.3607 LearningRate 0.0018 Epoch: 34 Global Step: 196830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:07,164-Speed 10403.38 samples/sec Loss 4.3847 LearningRate 0.0018 Epoch: 34 Global Step: 196840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:08,143-Speed 10466.38 samples/sec Loss 4.4068 LearningRate 0.0018 Epoch: 34 Global Step: 196850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:09,096-Speed 10750.38 samples/sec Loss 4.4140 LearningRate 0.0018 Epoch: 34 Global Step: 196860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:10,013-Speed 11175.86 samples/sec Loss 4.1482 LearningRate 0.0018 Epoch: 34 Global Step: 196870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:10,941-Speed 11041.53 samples/sec Loss 4.3781 LearningRate 0.0018 Epoch: 34 Global Step: 196880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:11,874-Speed 10991.85 samples/sec Loss 4.2540 LearningRate 0.0018 Epoch: 34 Global Step: 196890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:12,802-Speed 11037.47 samples/sec Loss 4.2974 LearningRate 0.0018 Epoch: 34 Global Step: 196900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:13,779-Speed 10490.47 samples/sec Loss 4.3893 LearningRate 0.0018 Epoch: 34 Global Step: 196910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:14,746-Speed 10589.55 samples/sec Loss 4.2276 LearningRate 0.0018 Epoch: 34 Global Step: 196920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:15,712-Speed 10612.44 samples/sec Loss 4.3966 LearningRate 0.0018 Epoch: 34 Global Step: 196930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:16,638-Speed 11067.54 samples/sec Loss 4.2909 LearningRate 0.0018 Epoch: 34 Global Step: 196940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:17,579-Speed 10887.51 samples/sec Loss 4.3154 LearningRate 0.0018 Epoch: 34 Global Step: 196950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:18,556-Speed 10489.29 samples/sec Loss 4.1759 LearningRate 0.0018 Epoch: 34 Global Step: 196960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:19,470-Speed 11211.17 samples/sec Loss 4.2749 LearningRate 0.0018 Epoch: 34 Global Step: 196970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:20,427-Speed 10700.99 samples/sec Loss 4.2722 LearningRate 0.0018 Epoch: 34 Global Step: 196980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:21,339-Speed 11246.99 samples/sec Loss 4.3277 LearningRate 0.0018 Epoch: 34 Global Step: 196990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:22,273-Speed 10967.91 samples/sec Loss 4.2798 LearningRate 0.0018 Epoch: 34 Global Step: 197000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:23,188-Speed 11197.26 samples/sec Loss 4.4070 LearningRate 0.0018 Epoch: 34 Global Step: 197010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:24,143-Speed 10729.28 samples/sec Loss 4.3893 LearningRate 0.0018 Epoch: 34 Global Step: 197020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:25,068-Speed 11075.90 samples/sec Loss 4.2929 LearningRate 0.0018 Epoch: 34 Global Step: 197030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:26,053-Speed 10407.21 samples/sec Loss 4.2395 LearningRate 0.0018 Epoch: 34 Global Step: 197040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:27,051-Speed 10261.76 samples/sec Loss 4.3081 LearningRate 0.0018 Epoch: 34 Global Step: 197050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:28,045-Speed 10307.12 samples/sec Loss 4.3344 LearningRate 0.0018 Epoch: 34 Global Step: 197060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:29,023-Speed 10484.58 samples/sec Loss 4.2676 LearningRate 0.0018 Epoch: 34 Global Step: 197070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:29,992-Speed 10571.30 samples/sec Loss 4.2896 LearningRate 0.0018 Epoch: 34 Global Step: 197080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:30,970-Speed 10477.02 samples/sec Loss 4.2265 LearningRate 0.0018 Epoch: 34 Global Step: 197090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:31,917-Speed 10828.31 samples/sec Loss 4.3846 LearningRate 0.0018 Epoch: 34 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:32,855-Speed 10923.79 samples/sec Loss 4.2518 LearningRate 0.0018 Epoch: 34 Global Step: 197110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:33,783-Speed 11037.39 samples/sec Loss 4.2256 LearningRate 0.0018 Epoch: 34 Global Step: 197120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:34,716-Speed 10982.08 samples/sec Loss 4.4613 LearningRate 0.0018 Epoch: 34 Global Step: 197130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:35,683-Speed 10591.77 samples/sec Loss 4.2694 LearningRate 0.0018 Epoch: 34 Global Step: 197140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:36,640-Speed 10713.52 samples/sec Loss 4.3964 LearningRate 0.0018 Epoch: 34 Global Step: 197150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:37,630-Speed 10345.31 samples/sec Loss 4.4309 LearningRate 0.0018 Epoch: 34 Global Step: 197160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:38,606-Speed 10506.10 samples/sec Loss 4.1931 LearningRate 0.0018 Epoch: 34 Global Step: 197170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:39,573-Speed 10598.11 samples/sec Loss 4.3190 LearningRate 0.0018 Epoch: 34 Global Step: 197180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:40,531-Speed 10689.17 samples/sec Loss 4.1614 LearningRate 0.0018 Epoch: 34 Global Step: 197190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:41,480-Speed 10805.44 samples/sec Loss 4.2388 LearningRate 0.0018 Epoch: 34 Global Step: 197200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:14:42,439-Speed 10675.23 samples/sec Loss 4.2000 LearningRate 0.0018 Epoch: 34 Global Step: 197210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:43,418-Speed 10470.67 samples/sec Loss 4.3408 LearningRate 0.0018 Epoch: 34 Global Step: 197220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:44,394-Speed 10501.80 samples/sec Loss 4.2841 LearningRate 0.0018 Epoch: 34 Global Step: 197230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:45,318-Speed 11088.43 samples/sec Loss 4.3121 LearningRate 0.0018 Epoch: 34 Global Step: 197240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:46,235-Speed 11172.53 samples/sec Loss 4.2664 LearningRate 0.0018 Epoch: 34 Global Step: 197250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:47,160-Speed 11079.05 samples/sec Loss 4.3607 LearningRate 0.0018 Epoch: 34 Global Step: 197260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:48,131-Speed 10550.75 samples/sec Loss 4.1797 LearningRate 0.0018 Epoch: 34 Global Step: 197270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:49,078-Speed 10823.56 samples/sec Loss 4.2911 LearningRate 0.0018 Epoch: 34 Global Step: 197280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:50,055-Speed 10481.35 samples/sec Loss 4.2996 LearningRate 0.0018 Epoch: 34 Global Step: 197290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:51,045-Speed 10352.03 samples/sec Loss 4.3436 LearningRate 0.0018 Epoch: 34 Global Step: 197300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:52,045-Speed 10254.72 samples/sec Loss 4.2978 LearningRate 0.0018 Epoch: 34 Global Step: 197310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:53,028-Speed 10421.92 samples/sec Loss 4.3417 LearningRate 0.0018 Epoch: 34 Global Step: 197320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:53,969-Speed 10891.21 samples/sec Loss 4.2976 LearningRate 0.0018 Epoch: 34 Global Step: 197330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:54,932-Speed 10633.04 samples/sec Loss 4.3350 LearningRate 0.0018 Epoch: 34 Global Step: 197340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:55,864-Speed 11001.10 samples/sec Loss 4.3385 LearningRate 0.0018 Epoch: 34 Global Step: 197350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:56,816-Speed 10766.19 samples/sec Loss 4.2198 LearningRate 0.0017 Epoch: 34 Global Step: 197360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:57,788-Speed 10539.20 samples/sec Loss 4.2873 LearningRate 0.0017 Epoch: 34 Global Step: 197370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:58,748-Speed 10668.40 samples/sec Loss 4.2934 LearningRate 0.0017 Epoch: 34 Global Step: 197380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:14:59,685-Speed 10934.80 samples/sec Loss 4.2942 LearningRate 0.0017 Epoch: 34 Global Step: 197390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:00,676-Speed 10336.30 samples/sec Loss 4.2937 LearningRate 0.0017 Epoch: 34 Global Step: 197400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:01,646-Speed 10570.90 samples/sec Loss 4.3388 LearningRate 0.0017 Epoch: 34 Global Step: 197410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:02,591-Speed 10840.94 samples/sec Loss 4.2593 LearningRate 0.0017 Epoch: 34 Global Step: 197420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:03,527-Speed 10941.09 samples/sec Loss 4.4768 LearningRate 0.0017 Epoch: 34 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:04,517-Speed 10354.63 samples/sec Loss 4.2769 LearningRate 0.0017 Epoch: 34 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:05,475-Speed 10696.73 samples/sec Loss 4.3580 LearningRate 0.0017 Epoch: 34 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:06,449-Speed 10519.76 samples/sec Loss 4.3193 LearningRate 0.0017 Epoch: 34 Global Step: 197460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:07,428-Speed 10468.44 samples/sec Loss 4.2097 LearningRate 0.0017 Epoch: 34 Global Step: 197470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:08,386-Speed 10693.45 samples/sec Loss 4.4260 LearningRate 0.0017 Epoch: 34 Global Step: 197480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:09,320-Speed 10966.13 samples/sec Loss 4.2181 LearningRate 0.0017 Epoch: 34 Global Step: 197490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:10,231-Speed 11248.90 samples/sec Loss 4.3055 LearningRate 0.0017 Epoch: 34 Global Step: 197500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:11,165-Speed 10975.38 samples/sec Loss 4.3118 LearningRate 0.0017 Epoch: 34 Global Step: 197510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:12,135-Speed 10560.31 samples/sec Loss 4.3256 LearningRate 0.0017 Epoch: 34 Global Step: 197520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:13,121-Speed 10396.26 samples/sec Loss 4.1183 LearningRate 0.0017 Epoch: 34 Global Step: 197530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:14,101-Speed 10457.27 samples/sec Loss 4.4137 LearningRate 0.0017 Epoch: 34 Global Step: 197540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:15,063-Speed 10654.06 samples/sec Loss 4.3642 LearningRate 0.0017 Epoch: 34 Global Step: 197550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:16,049-Speed 10390.27 samples/sec Loss 4.4809 LearningRate 0.0017 Epoch: 34 Global Step: 197560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:17,000-Speed 10778.57 samples/sec Loss 4.4285 LearningRate 0.0017 Epoch: 34 Global Step: 197570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:17,996-Speed 10286.60 samples/sec Loss 4.3060 LearningRate 0.0017 Epoch: 34 Global Step: 197580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:18,969-Speed 10525.42 samples/sec Loss 4.1876 LearningRate 0.0017 Epoch: 34 Global Step: 197590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:19,951-Speed 10430.79 samples/sec Loss 4.3414 LearningRate 0.0017 Epoch: 34 Global Step: 197600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:20,881-Speed 11015.96 samples/sec Loss 4.3273 LearningRate 0.0017 Epoch: 34 Global Step: 197610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:21,824-Speed 10868.45 samples/sec Loss 4.2773 LearningRate 0.0017 Epoch: 34 Global Step: 197620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:22,752-Speed 11045.62 samples/sec Loss 4.2858 LearningRate 0.0017 Epoch: 34 Global Step: 197630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:23,704-Speed 10764.40 samples/sec Loss 4.3284 LearningRate 0.0017 Epoch: 34 Global Step: 197640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:24,660-Speed 10716.65 samples/sec Loss 4.3418 LearningRate 0.0017 Epoch: 34 Global Step: 197650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:25,600-Speed 10901.49 samples/sec Loss 4.2814 LearningRate 0.0017 Epoch: 34 Global Step: 197660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:26,850-Speed 8197.11 samples/sec Loss 4.3073 LearningRate 0.0017 Epoch: 34 Global Step: 197670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:27,775-Speed 11079.24 samples/sec Loss 4.2070 LearningRate 0.0017 Epoch: 34 Global Step: 197680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:28,774-Speed 10252.78 samples/sec Loss 4.2454 LearningRate 0.0017 Epoch: 34 Global Step: 197690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:29,771-Speed 10281.11 samples/sec Loss 4.3088 LearningRate 0.0017 Epoch: 34 Global Step: 197700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:30,755-Speed 10408.99 samples/sec Loss 4.3051 LearningRate 0.0017 Epoch: 34 Global Step: 197710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:31,735-Speed 10457.14 samples/sec Loss 4.3131 LearningRate 0.0017 Epoch: 34 Global Step: 197720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:32,687-Speed 10765.58 samples/sec Loss 4.2825 LearningRate 0.0017 Epoch: 34 Global Step: 197730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:33,614-Speed 11050.20 samples/sec Loss 4.2953 LearningRate 0.0017 Epoch: 34 Global Step: 197740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:34,560-Speed 10828.37 samples/sec Loss 4.2488 LearningRate 0.0017 Epoch: 34 Global Step: 197750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:35,508-Speed 10814.53 samples/sec Loss 4.2997 LearningRate 0.0017 Epoch: 34 Global Step: 197760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:36,453-Speed 10836.62 samples/sec Loss 4.2923 LearningRate 0.0017 Epoch: 34 Global Step: 197770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:37,454-Speed 10235.72 samples/sec Loss 4.3119 LearningRate 0.0017 Epoch: 34 Global Step: 197780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:38,413-Speed 10687.74 samples/sec Loss 4.3304 LearningRate 0.0017 Epoch: 34 Global Step: 197790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:39,369-Speed 10725.55 samples/sec Loss 4.3646 LearningRate 0.0017 Epoch: 34 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:40,353-Speed 10422.02 samples/sec Loss 4.2498 LearningRate 0.0017 Epoch: 34 Global Step: 197810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:41,340-Speed 10382.73 samples/sec Loss 4.3424 LearningRate 0.0017 Epoch: 34 Global Step: 197820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:42,259-Speed 11143.77 samples/sec Loss 4.3555 LearningRate 0.0017 Epoch: 34 Global Step: 197830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:43,257-Speed 10270.87 samples/sec Loss 4.3210 LearningRate 0.0017 Epoch: 34 Global Step: 197840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:44,271-Speed 10104.83 samples/sec Loss 4.3456 LearningRate 0.0017 Epoch: 34 Global Step: 197850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:45,185-Speed 11207.66 samples/sec Loss 4.3785 LearningRate 0.0017 Epoch: 34 Global Step: 197860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:46,123-Speed 10928.92 samples/sec Loss 4.2866 LearningRate 0.0017 Epoch: 34 Global Step: 197870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:47,078-Speed 10729.36 samples/sec Loss 4.2698 LearningRate 0.0017 Epoch: 34 Global Step: 197880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:48,042-Speed 10623.11 samples/sec Loss 4.3450 LearningRate 0.0017 Epoch: 34 Global Step: 197890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:49,015-Speed 10538.12 samples/sec Loss 4.3815 LearningRate 0.0017 Epoch: 34 Global Step: 197900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:15:49,957-Speed 10874.13 samples/sec Loss 4.2839 LearningRate 0.0017 Epoch: 34 Global Step: 197910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:50,944-Speed 10376.83 samples/sec Loss 4.3961 LearningRate 0.0017 Epoch: 34 Global Step: 197920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:51,889-Speed 10841.60 samples/sec Loss 4.3155 LearningRate 0.0017 Epoch: 34 Global Step: 197930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:52,820-Speed 11007.83 samples/sec Loss 4.3154 LearningRate 0.0017 Epoch: 34 Global Step: 197940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:15:53,770-Speed 10790.85 samples/sec Loss 4.3528 LearningRate 0.0017 Epoch: 34 Global Step: 197950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:15:54,743-Speed 10526.64 samples/sec Loss 4.4605 LearningRate 0.0017 Epoch: 34 Global Step: 197960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:15:55,694-Speed 10774.91 samples/sec Loss 4.3463 LearningRate 0.0017 Epoch: 34 Global Step: 197970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:15:56,631-Speed 10940.57 samples/sec Loss 4.4018 LearningRate 0.0017 Epoch: 34 Global Step: 197980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:15:57,556-Speed 11083.48 samples/sec Loss 4.1413 LearningRate 0.0017 Epoch: 34 Global Step: 197990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:15:58,517-Speed 10660.55 samples/sec Loss 4.3709 LearningRate 0.0017 Epoch: 34 Global Step: 198000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:16:19,849-[lfw][198000]XNorm: 7.689162 Training: 2022-04-11 06:16:19,850-[lfw][198000]Accuracy-Flip: 0.99667+-0.00269 Training: 2022-04-11 06:16:19,850-[lfw][198000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:16:44,347-[cfp_fp][198000]XNorm: 6.581197 Training: 2022-04-11 06:16:44,347-[cfp_fp][198000]Accuracy-Flip: 0.95629+-0.01002 Training: 2022-04-11 06:16:44,348-[cfp_fp][198000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:17:05,551-[agedb_30][198000]XNorm: 7.531434 Training: 2022-04-11 06:17:05,552-[agedb_30][198000]Accuracy-Flip: 0.97167+-0.00730 Training: 2022-04-11 06:17:05,552-[agedb_30][198000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:17:06,505-Speed 150.62 samples/sec Loss 4.3749 LearningRate 0.0017 Epoch: 34 Global Step: 198010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:17:07,474-Speed 10574.57 samples/sec Loss 4.2784 LearningRate 0.0017 Epoch: 34 Global Step: 198020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:17:08,379-Speed 11317.39 samples/sec Loss 4.4149 LearningRate 0.0017 Epoch: 34 Global Step: 198030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:17:09,315-Speed 10953.63 samples/sec Loss 4.3119 LearningRate 0.0017 Epoch: 34 Global Step: 198040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:17:10,241-Speed 11057.11 samples/sec Loss 4.2969 LearningRate 0.0017 Epoch: 34 Global Step: 198050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:11,176-Speed 10967.27 samples/sec Loss 4.4462 LearningRate 0.0017 Epoch: 34 Global Step: 198060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:12,154-Speed 10476.01 samples/sec Loss 4.3422 LearningRate 0.0017 Epoch: 34 Global Step: 198070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:13,114-Speed 10676.54 samples/sec Loss 4.2844 LearningRate 0.0017 Epoch: 34 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:14,073-Speed 10684.69 samples/sec Loss 4.3644 LearningRate 0.0017 Epoch: 34 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:15,045-Speed 10541.95 samples/sec Loss 4.3224 LearningRate 0.0017 Epoch: 34 Global Step: 198100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:16,007-Speed 10653.68 samples/sec Loss 4.4074 LearningRate 0.0017 Epoch: 34 Global Step: 198110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:16,989-Speed 10437.85 samples/sec Loss 4.2812 LearningRate 0.0017 Epoch: 34 Global Step: 198120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:17,927-Speed 10923.85 samples/sec Loss 4.2643 LearningRate 0.0017 Epoch: 34 Global Step: 198130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:18,894-Speed 10591.59 samples/sec Loss 4.1841 LearningRate 0.0017 Epoch: 34 Global Step: 198140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:19,840-Speed 10839.48 samples/sec Loss 4.2472 LearningRate 0.0017 Epoch: 34 Global Step: 198150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:20,768-Speed 11042.15 samples/sec Loss 4.3303 LearningRate 0.0017 Epoch: 34 Global Step: 198160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:21,684-Speed 11175.10 samples/sec Loss 4.3971 LearningRate 0.0017 Epoch: 34 Global Step: 198170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:22,617-Speed 10983.62 samples/sec Loss 4.2765 LearningRate 0.0017 Epoch: 34 Global Step: 198180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:23,554-Speed 10940.65 samples/sec Loss 4.3722 LearningRate 0.0017 Epoch: 34 Global Step: 198190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:24,528-Speed 10512.57 samples/sec Loss 4.4522 LearningRate 0.0017 Epoch: 34 Global Step: 198200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:25,499-Speed 10557.01 samples/sec Loss 4.2473 LearningRate 0.0017 Epoch: 34 Global Step: 198210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:26,474-Speed 10507.38 samples/sec Loss 4.2008 LearningRate 0.0017 Epoch: 34 Global Step: 198220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:27,439-Speed 10619.25 samples/sec Loss 4.4270 LearningRate 0.0016 Epoch: 34 Global Step: 198230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:28,394-Speed 10730.76 samples/sec Loss 4.3273 LearningRate 0.0016 Epoch: 34 Global Step: 198240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:29,342-Speed 10811.92 samples/sec Loss 4.2892 LearningRate 0.0016 Epoch: 34 Global Step: 198250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:30,334-Speed 10333.13 samples/sec Loss 4.4388 LearningRate 0.0016 Epoch: 34 Global Step: 198260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:31,332-Speed 10260.40 samples/sec Loss 4.3468 LearningRate 0.0016 Epoch: 34 Global Step: 198270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:32,313-Speed 10445.65 samples/sec Loss 4.2721 LearningRate 0.0016 Epoch: 34 Global Step: 198280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:33,227-Speed 11211.61 samples/sec Loss 4.3300 LearningRate 0.0016 Epoch: 34 Global Step: 198290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:34,148-Speed 11130.15 samples/sec Loss 4.4036 LearningRate 0.0016 Epoch: 34 Global Step: 198300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:35,091-Speed 10861.20 samples/sec Loss 4.3987 LearningRate 0.0016 Epoch: 34 Global Step: 198310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:36,034-Speed 10872.70 samples/sec Loss 4.3808 LearningRate 0.0016 Epoch: 34 Global Step: 198320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:36,997-Speed 10636.29 samples/sec Loss 4.4936 LearningRate 0.0016 Epoch: 34 Global Step: 198330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:37,950-Speed 10753.45 samples/sec Loss 4.2982 LearningRate 0.0016 Epoch: 34 Global Step: 198340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:38,913-Speed 10639.90 samples/sec Loss 4.1685 LearningRate 0.0016 Epoch: 34 Global Step: 198350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:39,868-Speed 10722.78 samples/sec Loss 4.5074 LearningRate 0.0016 Epoch: 34 Global Step: 198360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:40,832-Speed 10638.24 samples/sec Loss 4.3166 LearningRate 0.0016 Epoch: 34 Global Step: 198370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:41,760-Speed 11048.09 samples/sec Loss 4.4654 LearningRate 0.0016 Epoch: 34 Global Step: 198380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:42,738-Speed 10476.87 samples/sec Loss 4.3161 LearningRate 0.0016 Epoch: 34 Global Step: 198390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:43,744-Speed 10181.95 samples/sec Loss 4.3225 LearningRate 0.0016 Epoch: 34 Global Step: 198400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:44,676-Speed 10998.00 samples/sec Loss 4.2815 LearningRate 0.0016 Epoch: 34 Global Step: 198410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:17:45,573-Speed 11418.90 samples/sec Loss 4.3765 LearningRate 0.0016 Epoch: 34 Global Step: 198420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:46,485-Speed 11242.33 samples/sec Loss 4.3253 LearningRate 0.0016 Epoch: 34 Global Step: 198430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:47,413-Speed 11032.58 samples/sec Loss 4.3293 LearningRate 0.0016 Epoch: 34 Global Step: 198440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:48,378-Speed 10623.51 samples/sec Loss 4.1974 LearningRate 0.0016 Epoch: 34 Global Step: 198450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:49,331-Speed 10750.03 samples/sec Loss 4.2168 LearningRate 0.0016 Epoch: 34 Global Step: 198460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:50,266-Speed 10956.72 samples/sec Loss 4.4283 LearningRate 0.0016 Epoch: 34 Global Step: 198470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:51,215-Speed 10805.42 samples/sec Loss 4.2267 LearningRate 0.0016 Epoch: 34 Global Step: 198480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:52,180-Speed 10608.51 samples/sec Loss 4.3269 LearningRate 0.0016 Epoch: 34 Global Step: 198490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:53,141-Speed 10666.18 samples/sec Loss 4.3749 LearningRate 0.0016 Epoch: 34 Global Step: 198500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:54,109-Speed 10587.08 samples/sec Loss 4.3366 LearningRate 0.0016 Epoch: 34 Global Step: 198510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:55,040-Speed 10999.37 samples/sec Loss 4.1547 LearningRate 0.0016 Epoch: 34 Global Step: 198520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:55,998-Speed 10703.57 samples/sec Loss 4.2511 LearningRate 0.0016 Epoch: 34 Global Step: 198530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:56,927-Speed 11028.33 samples/sec Loss 4.2456 LearningRate 0.0016 Epoch: 34 Global Step: 198540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:57,838-Speed 11248.89 samples/sec Loss 4.2791 LearningRate 0.0016 Epoch: 34 Global Step: 198550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:58,774-Speed 10950.89 samples/sec Loss 4.3425 LearningRate 0.0016 Epoch: 34 Global Step: 198560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:17:59,754-Speed 10454.41 samples/sec Loss 4.3741 LearningRate 0.0016 Epoch: 34 Global Step: 198570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:00,726-Speed 10539.16 samples/sec Loss 4.3375 LearningRate 0.0016 Epoch: 34 Global Step: 198580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:01,704-Speed 10476.98 samples/sec Loss 4.3900 LearningRate 0.0016 Epoch: 34 Global Step: 198590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:02,670-Speed 10605.97 samples/sec Loss 4.4068 LearningRate 0.0016 Epoch: 34 Global Step: 198600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:03,650-Speed 10460.24 samples/sec Loss 4.3444 LearningRate 0.0016 Epoch: 34 Global Step: 198610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:04,581-Speed 11001.01 samples/sec Loss 4.1614 LearningRate 0.0016 Epoch: 34 Global Step: 198620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:05,530-Speed 10803.84 samples/sec Loss 4.4499 LearningRate 0.0016 Epoch: 34 Global Step: 198630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:06,490-Speed 10670.94 samples/sec Loss 4.2341 LearningRate 0.0016 Epoch: 34 Global Step: 198640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:07,428-Speed 10924.64 samples/sec Loss 4.2168 LearningRate 0.0016 Epoch: 34 Global Step: 198650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:08,415-Speed 10387.71 samples/sec Loss 4.3882 LearningRate 0.0016 Epoch: 34 Global Step: 198660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:09,343-Speed 11031.27 samples/sec Loss 4.3180 LearningRate 0.0016 Epoch: 34 Global Step: 198670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:10,292-Speed 10806.79 samples/sec Loss 4.4305 LearningRate 0.0016 Epoch: 34 Global Step: 198680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:11,231-Speed 10913.28 samples/sec Loss 4.1737 LearningRate 0.0016 Epoch: 34 Global Step: 198690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:12,181-Speed 10775.91 samples/sec Loss 4.2269 LearningRate 0.0016 Epoch: 34 Global Step: 198700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:13,192-Speed 10141.55 samples/sec Loss 4.4527 LearningRate 0.0016 Epoch: 34 Global Step: 198710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:14,125-Speed 10984.31 samples/sec Loss 4.3302 LearningRate 0.0016 Epoch: 34 Global Step: 198720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:15,071-Speed 10833.59 samples/sec Loss 4.3206 LearningRate 0.0016 Epoch: 34 Global Step: 198730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:16,006-Speed 10950.56 samples/sec Loss 4.2603 LearningRate 0.0016 Epoch: 34 Global Step: 198740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:16,969-Speed 10640.67 samples/sec Loss 4.2102 LearningRate 0.0016 Epoch: 34 Global Step: 198750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:17,918-Speed 10801.21 samples/sec Loss 4.3279 LearningRate 0.0016 Epoch: 34 Global Step: 198760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:18,887-Speed 10572.30 samples/sec Loss 4.4769 LearningRate 0.0016 Epoch: 34 Global Step: 198770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:19,820-Speed 10981.25 samples/sec Loss 4.4785 LearningRate 0.0016 Epoch: 34 Global Step: 198780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:20,747-Speed 11050.78 samples/sec Loss 4.2580 LearningRate 0.0016 Epoch: 34 Global Step: 198790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:21,668-Speed 11127.74 samples/sec Loss 4.3150 LearningRate 0.0016 Epoch: 34 Global Step: 198800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:22,631-Speed 10640.36 samples/sec Loss 4.3721 LearningRate 0.0016 Epoch: 34 Global Step: 198810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:23,573-Speed 10884.00 samples/sec Loss 4.2619 LearningRate 0.0016 Epoch: 34 Global Step: 198820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:24,539-Speed 10602.27 samples/sec Loss 4.2586 LearningRate 0.0016 Epoch: 34 Global Step: 198830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:25,482-Speed 10870.69 samples/sec Loss 4.2166 LearningRate 0.0016 Epoch: 34 Global Step: 198840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:26,446-Speed 10634.12 samples/sec Loss 4.1975 LearningRate 0.0016 Epoch: 34 Global Step: 198850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:27,431-Speed 10400.19 samples/sec Loss 4.4947 LearningRate 0.0016 Epoch: 34 Global Step: 198860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:28,425-Speed 10307.90 samples/sec Loss 4.1400 LearningRate 0.0016 Epoch: 34 Global Step: 198870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:29,391-Speed 10612.15 samples/sec Loss 4.2472 LearningRate 0.0016 Epoch: 34 Global Step: 198880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:30,348-Speed 10704.81 samples/sec Loss 4.2630 LearningRate 0.0016 Epoch: 34 Global Step: 198890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:31,324-Speed 10496.03 samples/sec Loss 4.3766 LearningRate 0.0016 Epoch: 34 Global Step: 198900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:32,272-Speed 10806.06 samples/sec Loss 4.2036 LearningRate 0.0016 Epoch: 34 Global Step: 198910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:33,197-Speed 11075.52 samples/sec Loss 4.4041 LearningRate 0.0016 Epoch: 34 Global Step: 198920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:34,142-Speed 10847.47 samples/sec Loss 4.2602 LearningRate 0.0016 Epoch: 34 Global Step: 198930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:35,059-Speed 11168.94 samples/sec Loss 4.3702 LearningRate 0.0016 Epoch: 34 Global Step: 198940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:36,038-Speed 10467.40 samples/sec Loss 4.4108 LearningRate 0.0016 Epoch: 34 Global Step: 198950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:36,977-Speed 10912.87 samples/sec Loss 4.3779 LearningRate 0.0016 Epoch: 34 Global Step: 198960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:37,933-Speed 10722.14 samples/sec Loss 4.2958 LearningRate 0.0016 Epoch: 34 Global Step: 198970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:38,888-Speed 10730.72 samples/sec Loss 4.3211 LearningRate 0.0016 Epoch: 34 Global Step: 198980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:39,830-Speed 10877.72 samples/sec Loss 4.3740 LearningRate 0.0016 Epoch: 34 Global Step: 198990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:40,897-Speed 9598.89 samples/sec Loss 4.4592 LearningRate 0.0016 Epoch: 34 Global Step: 199000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:41,775-Speed 11680.43 samples/sec Loss 4.3894 LearningRate 0.0016 Epoch: 34 Global Step: 199010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:52,323-Speed 970.90 samples/sec Loss 4.0231 LearningRate 0.0016 Epoch: 35 Global Step: 199020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:53,317-Speed 10306.60 samples/sec Loss 4.2589 LearningRate 0.0016 Epoch: 35 Global Step: 199030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:54,337-Speed 10046.16 samples/sec Loss 4.1457 LearningRate 0.0016 Epoch: 35 Global Step: 199040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:55,315-Speed 10482.30 samples/sec Loss 4.0853 LearningRate 0.0016 Epoch: 35 Global Step: 199050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:56,411-Speed 9346.75 samples/sec Loss 3.9870 LearningRate 0.0016 Epoch: 35 Global Step: 199060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:57,466-Speed 9717.46 samples/sec Loss 4.0930 LearningRate 0.0016 Epoch: 35 Global Step: 199070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:18:58,429-Speed 10642.07 samples/sec Loss 4.0980 LearningRate 0.0016 Epoch: 35 Global Step: 199080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:18:59,405-Speed 10501.35 samples/sec Loss 4.0381 LearningRate 0.0016 Epoch: 35 Global Step: 199090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:00,362-Speed 10704.25 samples/sec Loss 4.1162 LearningRate 0.0016 Epoch: 35 Global Step: 199100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:01,368-Speed 10188.85 samples/sec Loss 3.9984 LearningRate 0.0016 Epoch: 35 Global Step: 199110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:02,326-Speed 10694.18 samples/sec Loss 4.1090 LearningRate 0.0016 Epoch: 35 Global Step: 199120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:03,328-Speed 10235.09 samples/sec Loss 4.0878 LearningRate 0.0015 Epoch: 35 Global Step: 199130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:04,351-Speed 10015.08 samples/sec Loss 4.1859 LearningRate 0.0015 Epoch: 35 Global Step: 199140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:05,341-Speed 10343.65 samples/sec Loss 4.0695 LearningRate 0.0015 Epoch: 35 Global Step: 199150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:06,309-Speed 10590.06 samples/sec Loss 4.1823 LearningRate 0.0015 Epoch: 35 Global Step: 199160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:07,289-Speed 10459.88 samples/sec Loss 3.9599 LearningRate 0.0015 Epoch: 35 Global Step: 199170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:08,295-Speed 10184.86 samples/sec Loss 4.0352 LearningRate 0.0015 Epoch: 35 Global Step: 199180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:09,251-Speed 10716.03 samples/sec Loss 4.1682 LearningRate 0.0015 Epoch: 35 Global Step: 199190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:10,246-Speed 10297.34 samples/sec Loss 4.1647 LearningRate 0.0015 Epoch: 35 Global Step: 199200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:11,216-Speed 10559.89 samples/sec Loss 4.0128 LearningRate 0.0015 Epoch: 35 Global Step: 199210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:12,201-Speed 10406.90 samples/sec Loss 4.0923 LearningRate 0.0015 Epoch: 35 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:13,159-Speed 10692.52 samples/sec Loss 4.1131 LearningRate 0.0015 Epoch: 35 Global Step: 199230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:14,120-Speed 10670.41 samples/sec Loss 4.0385 LearningRate 0.0015 Epoch: 35 Global Step: 199240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:15,059-Speed 10907.57 samples/sec Loss 4.1863 LearningRate 0.0015 Epoch: 35 Global Step: 199250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:16,009-Speed 10788.50 samples/sec Loss 4.0147 LearningRate 0.0015 Epoch: 35 Global Step: 199260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:16,994-Speed 10408.38 samples/sec Loss 4.1288 LearningRate 0.0015 Epoch: 35 Global Step: 199270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:17,936-Speed 10873.56 samples/sec Loss 4.0043 LearningRate 0.0015 Epoch: 35 Global Step: 199280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:18,894-Speed 10699.16 samples/sec Loss 4.0946 LearningRate 0.0015 Epoch: 35 Global Step: 199290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:19,835-Speed 10890.37 samples/sec Loss 4.0807 LearningRate 0.0015 Epoch: 35 Global Step: 199300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:20,774-Speed 10907.64 samples/sec Loss 4.1017 LearningRate 0.0015 Epoch: 35 Global Step: 199310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:21,687-Speed 11233.64 samples/sec Loss 4.1362 LearningRate 0.0015 Epoch: 35 Global Step: 199320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:22,628-Speed 10885.78 samples/sec Loss 4.2264 LearningRate 0.0015 Epoch: 35 Global Step: 199330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:23,570-Speed 10880.62 samples/sec Loss 3.9788 LearningRate 0.0015 Epoch: 35 Global Step: 199340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:24,500-Speed 11016.49 samples/sec Loss 4.0953 LearningRate 0.0015 Epoch: 35 Global Step: 199350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:25,406-Speed 11300.57 samples/sec Loss 3.9364 LearningRate 0.0015 Epoch: 35 Global Step: 199360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:26,363-Speed 10712.37 samples/sec Loss 4.1389 LearningRate 0.0015 Epoch: 35 Global Step: 199370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:27,352-Speed 10364.02 samples/sec Loss 4.1074 LearningRate 0.0015 Epoch: 35 Global Step: 199380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:28,351-Speed 10255.08 samples/sec Loss 4.1423 LearningRate 0.0015 Epoch: 35 Global Step: 199390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:29,296-Speed 10838.83 samples/sec Loss 4.0322 LearningRate 0.0015 Epoch: 35 Global Step: 199400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:30,206-Speed 11260.38 samples/sec Loss 4.0446 LearningRate 0.0015 Epoch: 35 Global Step: 199410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:31,133-Speed 11058.93 samples/sec Loss 4.0332 LearningRate 0.0015 Epoch: 35 Global Step: 199420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:32,110-Speed 10484.40 samples/sec Loss 4.0418 LearningRate 0.0015 Epoch: 35 Global Step: 199430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:33,032-Speed 11112.67 samples/sec Loss 4.2039 LearningRate 0.0015 Epoch: 35 Global Step: 199440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:33,948-Speed 11191.99 samples/sec Loss 4.0249 LearningRate 0.0015 Epoch: 35 Global Step: 199450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:34,895-Speed 10820.32 samples/sec Loss 4.1158 LearningRate 0.0015 Epoch: 35 Global Step: 199460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:35,879-Speed 10417.87 samples/sec Loss 3.9943 LearningRate 0.0015 Epoch: 35 Global Step: 199470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:36,861-Speed 10430.23 samples/sec Loss 4.1563 LearningRate 0.0015 Epoch: 35 Global Step: 199480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:37,805-Speed 10854.70 samples/sec Loss 4.1436 LearningRate 0.0015 Epoch: 35 Global Step: 199490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:38,756-Speed 10779.69 samples/sec Loss 4.0246 LearningRate 0.0015 Epoch: 35 Global Step: 199500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:39,713-Speed 10700.32 samples/sec Loss 4.0628 LearningRate 0.0015 Epoch: 35 Global Step: 199510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:40,680-Speed 10604.00 samples/sec Loss 4.1863 LearningRate 0.0015 Epoch: 35 Global Step: 199520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:41,643-Speed 10641.08 samples/sec Loss 4.0626 LearningRate 0.0015 Epoch: 35 Global Step: 199530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:42,627-Speed 10403.20 samples/sec Loss 4.1762 LearningRate 0.0015 Epoch: 35 Global Step: 199540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:43,622-Speed 10326.93 samples/sec Loss 4.0776 LearningRate 0.0015 Epoch: 35 Global Step: 199550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:44,564-Speed 10874.20 samples/sec Loss 4.1611 LearningRate 0.0015 Epoch: 35 Global Step: 199560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:45,494-Speed 11017.08 samples/sec Loss 4.1679 LearningRate 0.0015 Epoch: 35 Global Step: 199570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:46,425-Speed 11001.67 samples/sec Loss 4.1020 LearningRate 0.0015 Epoch: 35 Global Step: 199580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:47,358-Speed 10992.99 samples/sec Loss 4.1089 LearningRate 0.0015 Epoch: 35 Global Step: 199590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:48,325-Speed 10591.60 samples/sec Loss 4.1474 LearningRate 0.0015 Epoch: 35 Global Step: 199600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:49,279-Speed 10739.57 samples/sec Loss 4.1214 LearningRate 0.0015 Epoch: 35 Global Step: 199610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:50,220-Speed 10889.06 samples/sec Loss 4.2496 LearningRate 0.0015 Epoch: 35 Global Step: 199620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:51,206-Speed 10395.55 samples/sec Loss 4.1574 LearningRate 0.0015 Epoch: 35 Global Step: 199630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:52,182-Speed 10493.69 samples/sec Loss 4.1544 LearningRate 0.0015 Epoch: 35 Global Step: 199640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:53,151-Speed 10581.33 samples/sec Loss 4.1592 LearningRate 0.0015 Epoch: 35 Global Step: 199650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:54,124-Speed 10520.66 samples/sec Loss 4.1400 LearningRate 0.0015 Epoch: 35 Global Step: 199660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:55,063-Speed 10917.26 samples/sec Loss 4.0321 LearningRate 0.0015 Epoch: 35 Global Step: 199670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:56,040-Speed 10488.62 samples/sec Loss 4.1883 LearningRate 0.0015 Epoch: 35 Global Step: 199680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:56,967-Speed 11061.75 samples/sec Loss 4.1228 LearningRate 0.0015 Epoch: 35 Global Step: 199690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:57,889-Speed 11108.48 samples/sec Loss 4.1042 LearningRate 0.0015 Epoch: 35 Global Step: 199700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:19:58,811-Speed 11117.47 samples/sec Loss 4.0862 LearningRate 0.0015 Epoch: 35 Global Step: 199710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:19:59,778-Speed 10601.02 samples/sec Loss 4.1202 LearningRate 0.0015 Epoch: 35 Global Step: 199720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:00,746-Speed 10580.16 samples/sec Loss 4.0898 LearningRate 0.0015 Epoch: 35 Global Step: 199730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:01,715-Speed 10578.08 samples/sec Loss 4.0955 LearningRate 0.0015 Epoch: 35 Global Step: 199740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:02,683-Speed 10589.54 samples/sec Loss 4.1850 LearningRate 0.0015 Epoch: 35 Global Step: 199750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:03,588-Speed 11323.19 samples/sec Loss 4.0419 LearningRate 0.0015 Epoch: 35 Global Step: 199760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:04,558-Speed 10559.77 samples/sec Loss 4.1108 LearningRate 0.0015 Epoch: 35 Global Step: 199770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:05,533-Speed 10513.98 samples/sec Loss 3.9605 LearningRate 0.0015 Epoch: 35 Global Step: 199780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:06,523-Speed 10350.70 samples/sec Loss 4.1236 LearningRate 0.0015 Epoch: 35 Global Step: 199790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:07,504-Speed 10440.27 samples/sec Loss 4.0441 LearningRate 0.0015 Epoch: 35 Global Step: 199800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:08,474-Speed 10564.26 samples/sec Loss 4.1644 LearningRate 0.0015 Epoch: 35 Global Step: 199810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:09,395-Speed 11130.19 samples/sec Loss 4.0397 LearningRate 0.0015 Epoch: 35 Global Step: 199820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:10,317-Speed 11106.05 samples/sec Loss 4.0357 LearningRate 0.0015 Epoch: 35 Global Step: 199830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:11,234-Speed 11182.28 samples/sec Loss 4.1042 LearningRate 0.0015 Epoch: 35 Global Step: 199840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:12,184-Speed 10778.50 samples/sec Loss 4.1613 LearningRate 0.0015 Epoch: 35 Global Step: 199850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:13,140-Speed 10722.49 samples/sec Loss 4.3015 LearningRate 0.0015 Epoch: 35 Global Step: 199860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:14,107-Speed 10599.63 samples/sec Loss 4.1323 LearningRate 0.0015 Epoch: 35 Global Step: 199870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:15,075-Speed 10576.75 samples/sec Loss 4.1480 LearningRate 0.0015 Epoch: 35 Global Step: 199880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:16,021-Speed 10835.75 samples/sec Loss 4.0251 LearningRate 0.0015 Epoch: 35 Global Step: 199890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:16,994-Speed 10535.09 samples/sec Loss 4.0375 LearningRate 0.0015 Epoch: 35 Global Step: 199900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:17,954-Speed 10669.40 samples/sec Loss 4.0559 LearningRate 0.0015 Epoch: 35 Global Step: 199910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:18,902-Speed 10816.63 samples/sec Loss 4.1401 LearningRate 0.0015 Epoch: 35 Global Step: 199920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:19,891-Speed 10367.52 samples/sec Loss 4.1868 LearningRate 0.0015 Epoch: 35 Global Step: 199930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:20,798-Speed 11293.44 samples/sec Loss 4.0946 LearningRate 0.0015 Epoch: 35 Global Step: 199940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:21,719-Speed 11119.06 samples/sec Loss 4.1096 LearningRate 0.0015 Epoch: 35 Global Step: 199950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:22,658-Speed 10917.94 samples/sec Loss 4.1199 LearningRate 0.0015 Epoch: 35 Global Step: 199960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:23,584-Speed 11070.73 samples/sec Loss 4.0940 LearningRate 0.0015 Epoch: 35 Global Step: 199970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:24,546-Speed 10647.53 samples/sec Loss 4.2225 LearningRate 0.0015 Epoch: 35 Global Step: 199980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:20:25,493-Speed 10818.42 samples/sec Loss 4.1904 LearningRate 0.0015 Epoch: 35 Global Step: 199990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:26,486-Speed 10319.78 samples/sec Loss 4.0462 LearningRate 0.0015 Epoch: 35 Global Step: 200000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:20:47,967-[lfw][200000]XNorm: 7.706643 Training: 2022-04-11 06:20:47,967-[lfw][200000]Accuracy-Flip: 0.99700+-0.00245 Training: 2022-04-11 06:20:47,968-[lfw][200000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:21:12,566-[cfp_fp][200000]XNorm: 6.585034 Training: 2022-04-11 06:21:12,566-[cfp_fp][200000]Accuracy-Flip: 0.95757+-0.01088 Training: 2022-04-11 06:21:12,567-[cfp_fp][200000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:21:33,806-[agedb_30][200000]XNorm: 7.541606 Training: 2022-04-11 06:21:33,807-[agedb_30][200000]Accuracy-Flip: 0.96850+-0.00673 Training: 2022-04-11 06:21:33,807-[agedb_30][200000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:21:34,760-Speed 149.99 samples/sec Loss 4.1650 LearningRate 0.0015 Epoch: 35 Global Step: 200010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:35,680-Speed 11132.02 samples/sec Loss 4.2932 LearningRate 0.0015 Epoch: 35 Global Step: 200020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:36,677-Speed 10284.69 samples/sec Loss 4.2696 LearningRate 0.0015 Epoch: 35 Global Step: 200030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:37,629-Speed 10762.94 samples/sec Loss 4.1152 LearningRate 0.0015 Epoch: 35 Global Step: 200040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:38,638-Speed 10147.49 samples/sec Loss 4.2608 LearningRate 0.0015 Epoch: 35 Global Step: 200050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:39,613-Speed 10517.48 samples/sec Loss 4.0446 LearningRate 0.0014 Epoch: 35 Global Step: 200060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:40,562-Speed 10798.72 samples/sec Loss 4.0335 LearningRate 0.0014 Epoch: 35 Global Step: 200070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:41,530-Speed 10578.33 samples/sec Loss 4.1043 LearningRate 0.0014 Epoch: 35 Global Step: 200080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:42,500-Speed 10567.20 samples/sec Loss 4.2348 LearningRate 0.0014 Epoch: 35 Global Step: 200090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:43,468-Speed 10581.58 samples/sec Loss 4.0994 LearningRate 0.0014 Epoch: 35 Global Step: 200100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:44,407-Speed 10914.17 samples/sec Loss 4.1042 LearningRate 0.0014 Epoch: 35 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:45,318-Speed 11253.12 samples/sec Loss 4.1722 LearningRate 0.0014 Epoch: 35 Global Step: 200120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:46,231-Speed 11224.14 samples/sec Loss 4.0697 LearningRate 0.0014 Epoch: 35 Global Step: 200130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:47,154-Speed 11096.31 samples/sec Loss 4.1387 LearningRate 0.0014 Epoch: 35 Global Step: 200140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:48,145-Speed 10343.12 samples/sec Loss 4.0921 LearningRate 0.0014 Epoch: 35 Global Step: 200150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:49,087-Speed 10876.03 samples/sec Loss 4.2244 LearningRate 0.0014 Epoch: 35 Global Step: 200160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:50,065-Speed 10480.24 samples/sec Loss 4.0660 LearningRate 0.0014 Epoch: 35 Global Step: 200170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:50,991-Speed 11057.48 samples/sec Loss 4.0948 LearningRate 0.0014 Epoch: 35 Global Step: 200180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:51,933-Speed 10886.00 samples/sec Loss 4.2241 LearningRate 0.0014 Epoch: 35 Global Step: 200190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:52,888-Speed 10730.30 samples/sec Loss 4.2023 LearningRate 0.0014 Epoch: 35 Global Step: 200200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:21:53,832-Speed 10850.99 samples/sec Loss 4.2002 LearningRate 0.0014 Epoch: 35 Global Step: 200210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:54,778-Speed 10826.06 samples/sec Loss 4.2003 LearningRate 0.0014 Epoch: 35 Global Step: 200220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:55,732-Speed 10747.95 samples/sec Loss 4.1376 LearningRate 0.0014 Epoch: 35 Global Step: 200230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:56,644-Speed 11238.31 samples/sec Loss 4.1625 LearningRate 0.0014 Epoch: 35 Global Step: 200240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:57,555-Speed 11240.10 samples/sec Loss 4.1311 LearningRate 0.0014 Epoch: 35 Global Step: 200250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:58,502-Speed 10823.58 samples/sec Loss 4.0586 LearningRate 0.0014 Epoch: 35 Global Step: 200260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:21:59,446-Speed 10857.89 samples/sec Loss 4.0355 LearningRate 0.0014 Epoch: 35 Global Step: 200270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:00,380-Speed 10963.90 samples/sec Loss 4.0790 LearningRate 0.0014 Epoch: 35 Global Step: 200280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:01,308-Speed 11050.74 samples/sec Loss 4.2391 LearningRate 0.0014 Epoch: 35 Global Step: 200290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:02,257-Speed 10790.75 samples/sec Loss 4.0993 LearningRate 0.0014 Epoch: 35 Global Step: 200300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:03,227-Speed 10571.86 samples/sec Loss 4.0474 LearningRate 0.0014 Epoch: 35 Global Step: 200310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:04,177-Speed 10779.81 samples/sec Loss 4.2255 LearningRate 0.0014 Epoch: 35 Global Step: 200320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:05,139-Speed 10657.97 samples/sec Loss 4.1086 LearningRate 0.0014 Epoch: 35 Global Step: 200330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:06,065-Speed 11065.48 samples/sec Loss 4.1244 LearningRate 0.0014 Epoch: 35 Global Step: 200340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:07,028-Speed 10638.76 samples/sec Loss 4.0936 LearningRate 0.0014 Epoch: 35 Global Step: 200350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:07,978-Speed 10791.10 samples/sec Loss 4.1050 LearningRate 0.0014 Epoch: 35 Global Step: 200360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:08,891-Speed 11226.49 samples/sec Loss 4.0584 LearningRate 0.0014 Epoch: 35 Global Step: 200370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:09,820-Speed 11030.07 samples/sec Loss 4.3254 LearningRate 0.0014 Epoch: 35 Global Step: 200380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:10,763-Speed 10866.45 samples/sec Loss 4.0200 LearningRate 0.0014 Epoch: 35 Global Step: 200390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:11,716-Speed 10743.73 samples/sec Loss 4.0680 LearningRate 0.0014 Epoch: 35 Global Step: 200400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:12,677-Speed 10661.46 samples/sec Loss 4.1738 LearningRate 0.0014 Epoch: 35 Global Step: 200410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:13,648-Speed 10563.24 samples/sec Loss 4.2005 LearningRate 0.0014 Epoch: 35 Global Step: 200420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:14,588-Speed 10892.17 samples/sec Loss 4.1558 LearningRate 0.0014 Epoch: 35 Global Step: 200430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:15,501-Speed 11225.21 samples/sec Loss 4.2241 LearningRate 0.0014 Epoch: 35 Global Step: 200440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:16,459-Speed 10696.79 samples/sec Loss 4.2847 LearningRate 0.0014 Epoch: 35 Global Step: 200450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:17,467-Speed 10162.97 samples/sec Loss 3.9922 LearningRate 0.0014 Epoch: 35 Global Step: 200460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:18,461-Speed 10314.20 samples/sec Loss 4.1496 LearningRate 0.0014 Epoch: 35 Global Step: 200470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:19,397-Speed 10939.57 samples/sec Loss 4.1008 LearningRate 0.0014 Epoch: 35 Global Step: 200480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:20,373-Speed 10501.46 samples/sec Loss 4.1401 LearningRate 0.0014 Epoch: 35 Global Step: 200490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:21,293-Speed 11140.70 samples/sec Loss 4.2440 LearningRate 0.0014 Epoch: 35 Global Step: 200500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:22,224-Speed 11006.35 samples/sec Loss 4.1445 LearningRate 0.0014 Epoch: 35 Global Step: 200510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:23,136-Speed 11233.77 samples/sec Loss 4.1026 LearningRate 0.0014 Epoch: 35 Global Step: 200520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:24,082-Speed 10834.21 samples/sec Loss 4.1384 LearningRate 0.0014 Epoch: 35 Global Step: 200530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:25,009-Speed 11045.12 samples/sec Loss 4.1547 LearningRate 0.0014 Epoch: 35 Global Step: 200540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:25,923-Speed 11221.22 samples/sec Loss 4.0961 LearningRate 0.0014 Epoch: 35 Global Step: 200550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:26,876-Speed 10748.64 samples/sec Loss 4.0829 LearningRate 0.0014 Epoch: 35 Global Step: 200560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:27,850-Speed 10524.96 samples/sec Loss 4.1467 LearningRate 0.0014 Epoch: 35 Global Step: 200570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:28,825-Speed 10511.56 samples/sec Loss 4.0124 LearningRate 0.0014 Epoch: 35 Global Step: 200580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:29,775-Speed 10781.71 samples/sec Loss 4.0966 LearningRate 0.0014 Epoch: 35 Global Step: 200590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:30,724-Speed 10800.29 samples/sec Loss 4.2395 LearningRate 0.0014 Epoch: 35 Global Step: 200600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:31,705-Speed 10445.85 samples/sec Loss 4.2009 LearningRate 0.0014 Epoch: 35 Global Step: 200610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:32,634-Speed 11032.21 samples/sec Loss 4.2126 LearningRate 0.0014 Epoch: 35 Global Step: 200620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:33,555-Speed 11120.90 samples/sec Loss 4.2446 LearningRate 0.0014 Epoch: 35 Global Step: 200630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:34,481-Speed 11066.15 samples/sec Loss 4.1600 LearningRate 0.0014 Epoch: 35 Global Step: 200640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:35,415-Speed 10974.52 samples/sec Loss 4.1384 LearningRate 0.0014 Epoch: 35 Global Step: 200650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:36,376-Speed 10663.20 samples/sec Loss 4.0990 LearningRate 0.0014 Epoch: 35 Global Step: 200660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:37,331-Speed 10723.88 samples/sec Loss 4.1550 LearningRate 0.0014 Epoch: 35 Global Step: 200670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:38,294-Speed 10646.50 samples/sec Loss 4.1635 LearningRate 0.0014 Epoch: 35 Global Step: 200680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:39,287-Speed 10320.00 samples/sec Loss 4.1710 LearningRate 0.0014 Epoch: 35 Global Step: 200690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:40,264-Speed 10616.51 samples/sec Loss 4.2077 LearningRate 0.0014 Epoch: 35 Global Step: 200700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:41,205-Speed 10890.42 samples/sec Loss 4.0176 LearningRate 0.0014 Epoch: 35 Global Step: 200710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:42,180-Speed 10512.62 samples/sec Loss 4.1528 LearningRate 0.0014 Epoch: 35 Global Step: 200720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:43,111-Speed 11005.88 samples/sec Loss 4.0571 LearningRate 0.0014 Epoch: 35 Global Step: 200730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:44,058-Speed 10824.84 samples/sec Loss 4.1356 LearningRate 0.0014 Epoch: 35 Global Step: 200740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:44,992-Speed 10971.47 samples/sec Loss 4.0646 LearningRate 0.0014 Epoch: 35 Global Step: 200750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:45,933-Speed 10889.65 samples/sec Loss 4.1109 LearningRate 0.0014 Epoch: 35 Global Step: 200760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:46,840-Speed 11290.76 samples/sec Loss 4.2390 LearningRate 0.0014 Epoch: 35 Global Step: 200770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:47,780-Speed 10902.63 samples/sec Loss 4.1099 LearningRate 0.0014 Epoch: 35 Global Step: 200780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:48,732-Speed 10763.01 samples/sec Loss 4.1759 LearningRate 0.0014 Epoch: 35 Global Step: 200790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:49,663-Speed 11008.03 samples/sec Loss 4.1147 LearningRate 0.0014 Epoch: 35 Global Step: 200800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:50,642-Speed 10468.25 samples/sec Loss 4.0863 LearningRate 0.0014 Epoch: 35 Global Step: 200810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:51,589-Speed 10829.95 samples/sec Loss 4.1029 LearningRate 0.0014 Epoch: 35 Global Step: 200820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:52,528-Speed 10906.94 samples/sec Loss 4.0535 LearningRate 0.0014 Epoch: 35 Global Step: 200830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:53,456-Speed 11048.72 samples/sec Loss 4.2101 LearningRate 0.0014 Epoch: 35 Global Step: 200840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:54,367-Speed 11249.30 samples/sec Loss 4.2107 LearningRate 0.0014 Epoch: 35 Global Step: 200850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:55,331-Speed 10623.86 samples/sec Loss 4.1644 LearningRate 0.0014 Epoch: 35 Global Step: 200860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:56,289-Speed 10693.79 samples/sec Loss 4.2094 LearningRate 0.0014 Epoch: 35 Global Step: 200870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:22:57,226-Speed 10942.98 samples/sec Loss 4.0348 LearningRate 0.0014 Epoch: 35 Global Step: 200880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:58,147-Speed 11117.25 samples/sec Loss 4.1895 LearningRate 0.0014 Epoch: 35 Global Step: 200890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:59,070-Speed 11102.29 samples/sec Loss 4.0953 LearningRate 0.0014 Epoch: 35 Global Step: 200900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:22:59,989-Speed 11151.37 samples/sec Loss 4.2282 LearningRate 0.0014 Epoch: 35 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:00,957-Speed 10588.19 samples/sec Loss 4.1467 LearningRate 0.0014 Epoch: 35 Global Step: 200920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:01,919-Speed 10645.47 samples/sec Loss 4.1152 LearningRate 0.0014 Epoch: 35 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:02,846-Speed 11059.10 samples/sec Loss 4.1090 LearningRate 0.0014 Epoch: 35 Global Step: 200940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:03,801-Speed 10733.96 samples/sec Loss 4.0671 LearningRate 0.0014 Epoch: 35 Global Step: 200950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:04,769-Speed 10578.88 samples/sec Loss 4.2232 LearningRate 0.0014 Epoch: 35 Global Step: 200960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:05,687-Speed 11170.78 samples/sec Loss 4.2327 LearningRate 0.0014 Epoch: 35 Global Step: 200970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:06,663-Speed 10493.47 samples/sec Loss 4.2297 LearningRate 0.0014 Epoch: 35 Global Step: 200980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:07,600-Speed 10931.20 samples/sec Loss 4.0878 LearningRate 0.0014 Epoch: 35 Global Step: 200990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:08,564-Speed 10632.26 samples/sec Loss 4.2214 LearningRate 0.0014 Epoch: 35 Global Step: 201000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:09,477-Speed 11229.74 samples/sec Loss 4.2072 LearningRate 0.0014 Epoch: 35 Global Step: 201010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:10,401-Speed 11088.77 samples/sec Loss 4.1853 LearningRate 0.0013 Epoch: 35 Global Step: 201020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:11,331-Speed 11012.23 samples/sec Loss 4.1150 LearningRate 0.0013 Epoch: 35 Global Step: 201030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:23:12,272-Speed 10899.41 samples/sec Loss 4.0955 LearningRate 0.0013 Epoch: 35 Global Step: 201040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:13,236-Speed 10630.75 samples/sec Loss 4.0905 LearningRate 0.0013 Epoch: 35 Global Step: 201050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:14,183-Speed 10832.26 samples/sec Loss 4.1209 LearningRate 0.0013 Epoch: 35 Global Step: 201060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:15,161-Speed 10470.89 samples/sec Loss 4.1270 LearningRate 0.0013 Epoch: 35 Global Step: 201070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:16,139-Speed 10478.76 samples/sec Loss 4.2186 LearningRate 0.0013 Epoch: 35 Global Step: 201080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:17,090-Speed 10770.48 samples/sec Loss 4.1839 LearningRate 0.0013 Epoch: 35 Global Step: 201090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:18,013-Speed 11106.13 samples/sec Loss 4.1858 LearningRate 0.0013 Epoch: 35 Global Step: 201100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:18,948-Speed 10959.04 samples/sec Loss 4.1979 LearningRate 0.0013 Epoch: 35 Global Step: 201110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:19,910-Speed 10651.91 samples/sec Loss 4.2038 LearningRate 0.0013 Epoch: 35 Global Step: 201120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:20,860-Speed 10788.59 samples/sec Loss 4.1967 LearningRate 0.0013 Epoch: 35 Global Step: 201130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:21,771-Speed 11241.46 samples/sec Loss 4.1831 LearningRate 0.0013 Epoch: 35 Global Step: 201140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:22,701-Speed 11024.05 samples/sec Loss 4.1075 LearningRate 0.0013 Epoch: 35 Global Step: 201150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:23,670-Speed 10575.67 samples/sec Loss 4.0855 LearningRate 0.0013 Epoch: 35 Global Step: 201160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:24,604-Speed 10965.45 samples/sec Loss 4.1802 LearningRate 0.0013 Epoch: 35 Global Step: 201170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:25,609-Speed 10199.78 samples/sec Loss 4.1612 LearningRate 0.0013 Epoch: 35 Global Step: 201180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:26,574-Speed 10616.37 samples/sec Loss 4.1241 LearningRate 0.0013 Epoch: 35 Global Step: 201190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:27,519-Speed 10844.26 samples/sec Loss 4.1925 LearningRate 0.0013 Epoch: 35 Global Step: 201200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:28,449-Speed 11014.83 samples/sec Loss 4.1522 LearningRate 0.0013 Epoch: 35 Global Step: 201210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:29,368-Speed 11153.16 samples/sec Loss 4.1299 LearningRate 0.0013 Epoch: 35 Global Step: 201220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:30,373-Speed 10193.97 samples/sec Loss 4.2221 LearningRate 0.0013 Epoch: 35 Global Step: 201230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:31,340-Speed 10594.35 samples/sec Loss 4.0881 LearningRate 0.0013 Epoch: 35 Global Step: 201240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:32,319-Speed 10473.61 samples/sec Loss 4.3043 LearningRate 0.0013 Epoch: 35 Global Step: 201250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:33,252-Speed 10978.88 samples/sec Loss 4.0658 LearningRate 0.0013 Epoch: 35 Global Step: 201260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:34,165-Speed 11227.75 samples/sec Loss 4.2203 LearningRate 0.0013 Epoch: 35 Global Step: 201270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:35,114-Speed 10793.32 samples/sec Loss 4.1603 LearningRate 0.0013 Epoch: 35 Global Step: 201280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:36,104-Speed 10356.97 samples/sec Loss 4.1219 LearningRate 0.0013 Epoch: 35 Global Step: 201290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:37,110-Speed 10179.85 samples/sec Loss 4.1944 LearningRate 0.0013 Epoch: 35 Global Step: 201300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:38,075-Speed 10628.18 samples/sec Loss 4.2097 LearningRate 0.0013 Epoch: 35 Global Step: 201310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:39,058-Speed 10416.10 samples/sec Loss 4.1904 LearningRate 0.0013 Epoch: 35 Global Step: 201320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:40,030-Speed 10547.68 samples/sec Loss 4.1798 LearningRate 0.0013 Epoch: 35 Global Step: 201330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:40,970-Speed 10893.45 samples/sec Loss 4.0195 LearningRate 0.0013 Epoch: 35 Global Step: 201340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:41,884-Speed 11211.74 samples/sec Loss 4.2684 LearningRate 0.0013 Epoch: 35 Global Step: 201350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:42,812-Speed 11042.52 samples/sec Loss 4.1517 LearningRate 0.0013 Epoch: 35 Global Step: 201360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:43,784-Speed 10546.04 samples/sec Loss 4.1197 LearningRate 0.0013 Epoch: 35 Global Step: 201370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:44,724-Speed 10892.53 samples/sec Loss 4.2456 LearningRate 0.0013 Epoch: 35 Global Step: 201380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:45,639-Speed 11202.87 samples/sec Loss 4.3129 LearningRate 0.0013 Epoch: 35 Global Step: 201390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:46,555-Speed 11183.51 samples/sec Loss 4.1805 LearningRate 0.0013 Epoch: 35 Global Step: 201400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:47,479-Speed 11098.92 samples/sec Loss 4.2652 LearningRate 0.0013 Epoch: 35 Global Step: 201410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:48,415-Speed 10938.57 samples/sec Loss 4.1621 LearningRate 0.0013 Epoch: 35 Global Step: 201420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:49,372-Speed 10714.16 samples/sec Loss 4.2268 LearningRate 0.0013 Epoch: 35 Global Step: 201430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:50,324-Speed 10759.45 samples/sec Loss 4.1272 LearningRate 0.0013 Epoch: 35 Global Step: 201440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:51,244-Speed 11142.34 samples/sec Loss 4.1136 LearningRate 0.0013 Epoch: 35 Global Step: 201450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:52,225-Speed 10436.63 samples/sec Loss 4.2568 LearningRate 0.0013 Epoch: 35 Global Step: 201460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:53,184-Speed 10687.17 samples/sec Loss 4.1885 LearningRate 0.0013 Epoch: 35 Global Step: 201470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:54,153-Speed 10576.30 samples/sec Loss 4.1597 LearningRate 0.0013 Epoch: 35 Global Step: 201480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:55,131-Speed 10479.49 samples/sec Loss 4.2908 LearningRate 0.0013 Epoch: 35 Global Step: 201490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:56,078-Speed 10821.20 samples/sec Loss 4.1989 LearningRate 0.0013 Epoch: 35 Global Step: 201500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:56,998-Speed 11144.71 samples/sec Loss 4.2273 LearningRate 0.0013 Epoch: 35 Global Step: 201510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:57,946-Speed 10806.49 samples/sec Loss 4.1576 LearningRate 0.0013 Epoch: 35 Global Step: 201520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:23:58,851-Speed 11319.94 samples/sec Loss 4.1516 LearningRate 0.0013 Epoch: 35 Global Step: 201530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:23:59,803-Speed 10770.48 samples/sec Loss 4.2821 LearningRate 0.0013 Epoch: 35 Global Step: 201540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:00,780-Speed 10490.16 samples/sec Loss 4.2128 LearningRate 0.0013 Epoch: 35 Global Step: 201550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:01,732-Speed 10754.31 samples/sec Loss 4.0720 LearningRate 0.0013 Epoch: 35 Global Step: 201560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:02,708-Speed 10505.16 samples/sec Loss 4.2821 LearningRate 0.0013 Epoch: 35 Global Step: 201570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:03,668-Speed 10668.56 samples/sec Loss 4.2556 LearningRate 0.0013 Epoch: 35 Global Step: 201580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:04,650-Speed 10434.46 samples/sec Loss 4.2039 LearningRate 0.0013 Epoch: 35 Global Step: 201590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:05,613-Speed 10648.42 samples/sec Loss 4.0752 LearningRate 0.0013 Epoch: 35 Global Step: 201600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:06,582-Speed 10569.55 samples/sec Loss 4.1135 LearningRate 0.0013 Epoch: 35 Global Step: 201610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:07,555-Speed 10535.06 samples/sec Loss 4.3813 LearningRate 0.0013 Epoch: 35 Global Step: 201620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:08,471-Speed 11181.19 samples/sec Loss 4.1779 LearningRate 0.0013 Epoch: 35 Global Step: 201630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:09,405-Speed 10980.71 samples/sec Loss 4.2257 LearningRate 0.0013 Epoch: 35 Global Step: 201640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:10,319-Speed 11202.98 samples/sec Loss 4.1746 LearningRate 0.0013 Epoch: 35 Global Step: 201650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:11,225-Speed 11317.82 samples/sec Loss 4.1288 LearningRate 0.0013 Epoch: 35 Global Step: 201660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:12,151-Speed 11057.72 samples/sec Loss 4.2784 LearningRate 0.0013 Epoch: 35 Global Step: 201670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:13,111-Speed 10675.26 samples/sec Loss 4.1669 LearningRate 0.0013 Epoch: 35 Global Step: 201680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:14,042-Speed 11013.09 samples/sec Loss 4.2415 LearningRate 0.0013 Epoch: 35 Global Step: 201690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:14,981-Speed 10911.91 samples/sec Loss 4.1005 LearningRate 0.0013 Epoch: 35 Global Step: 201700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:15,955-Speed 10517.76 samples/sec Loss 4.1804 LearningRate 0.0013 Epoch: 35 Global Step: 201710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:16,913-Speed 10697.51 samples/sec Loss 4.1320 LearningRate 0.0013 Epoch: 35 Global Step: 201720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:24:17,873-Speed 10674.30 samples/sec Loss 4.1482 LearningRate 0.0013 Epoch: 35 Global Step: 201730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:18,827-Speed 10745.54 samples/sec Loss 4.3222 LearningRate 0.0013 Epoch: 35 Global Step: 201740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:19,799-Speed 10534.91 samples/sec Loss 4.0747 LearningRate 0.0013 Epoch: 35 Global Step: 201750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:20,710-Speed 11257.30 samples/sec Loss 4.1179 LearningRate 0.0013 Epoch: 35 Global Step: 201760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:21,622-Speed 11232.32 samples/sec Loss 4.0941 LearningRate 0.0013 Epoch: 35 Global Step: 201770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:22,551-Speed 11032.84 samples/sec Loss 4.1207 LearningRate 0.0013 Epoch: 35 Global Step: 201780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:23,518-Speed 10595.63 samples/sec Loss 4.2410 LearningRate 0.0013 Epoch: 35 Global Step: 201790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:24,440-Speed 11106.15 samples/sec Loss 4.2496 LearningRate 0.0013 Epoch: 35 Global Step: 201800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:25,364-Speed 11098.53 samples/sec Loss 4.1413 LearningRate 0.0013 Epoch: 35 Global Step: 201810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:26,301-Speed 10937.20 samples/sec Loss 4.1835 LearningRate 0.0013 Epoch: 35 Global Step: 201820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:27,245-Speed 10854.83 samples/sec Loss 3.9410 LearningRate 0.0013 Epoch: 35 Global Step: 201830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:24:28,175-Speed 11013.87 samples/sec Loss 4.1689 LearningRate 0.0013 Epoch: 35 Global Step: 201840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:24:29,137-Speed 10649.97 samples/sec Loss 4.1016 LearningRate 0.0013 Epoch: 35 Global Step: 201850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:30,077-Speed 10909.31 samples/sec Loss 4.3083 LearningRate 0.0013 Epoch: 35 Global Step: 201860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:31,066-Speed 10357.09 samples/sec Loss 4.2309 LearningRate 0.0013 Epoch: 35 Global Step: 201870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:32,014-Speed 10809.80 samples/sec Loss 4.0807 LearningRate 0.0013 Epoch: 35 Global Step: 201880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:32,937-Speed 11102.84 samples/sec Loss 4.1072 LearningRate 0.0013 Epoch: 35 Global Step: 201890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:33,856-Speed 11145.53 samples/sec Loss 4.1041 LearningRate 0.0013 Epoch: 35 Global Step: 201900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:34,804-Speed 10815.52 samples/sec Loss 4.1285 LearningRate 0.0013 Epoch: 35 Global Step: 201910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:35,775-Speed 10554.32 samples/sec Loss 4.1895 LearningRate 0.0013 Epoch: 35 Global Step: 201920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:36,691-Speed 11180.37 samples/sec Loss 4.2909 LearningRate 0.0013 Epoch: 35 Global Step: 201930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:37,640-Speed 10792.17 samples/sec Loss 4.1394 LearningRate 0.0013 Epoch: 35 Global Step: 201940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:38,570-Speed 11025.46 samples/sec Loss 4.1946 LearningRate 0.0013 Epoch: 35 Global Step: 201950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:24:39,486-Speed 11195.10 samples/sec Loss 4.1359 LearningRate 0.0013 Epoch: 35 Global Step: 201960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:40,466-Speed 10457.89 samples/sec Loss 4.1874 LearningRate 0.0013 Epoch: 35 Global Step: 201970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:41,426-Speed 10670.88 samples/sec Loss 4.1774 LearningRate 0.0013 Epoch: 35 Global Step: 201980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:42,402-Speed 10501.74 samples/sec Loss 4.1417 LearningRate 0.0013 Epoch: 35 Global Step: 201990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:24:43,346-Speed 10853.73 samples/sec Loss 4.2267 LearningRate 0.0013 Epoch: 35 Global Step: 202000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:04,870-[lfw][202000]XNorm: 7.679908 Training: 2022-04-11 06:25:04,871-[lfw][202000]Accuracy-Flip: 0.99733+-0.00271 Training: 2022-04-11 06:25:04,871-[lfw][202000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:25:29,668-[cfp_fp][202000]XNorm: 6.570194 Training: 2022-04-11 06:25:29,669-[cfp_fp][202000]Accuracy-Flip: 0.95657+-0.01163 Training: 2022-04-11 06:25:29,669-[cfp_fp][202000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:25:50,928-[agedb_30][202000]XNorm: 7.516700 Training: 2022-04-11 06:25:50,928-[agedb_30][202000]Accuracy-Flip: 0.96833+-0.00726 Training: 2022-04-11 06:25:50,929-[agedb_30][202000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:25:51,884-Speed 149.41 samples/sec Loss 4.1524 LearningRate 0.0013 Epoch: 35 Global Step: 202010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:52,812-Speed 11039.68 samples/sec Loss 4.1726 LearningRate 0.0012 Epoch: 35 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:53,754-Speed 10875.15 samples/sec Loss 4.2179 LearningRate 0.0012 Epoch: 35 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:54,704-Speed 10782.69 samples/sec Loss 4.2172 LearningRate 0.0012 Epoch: 35 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:55,638-Speed 10975.33 samples/sec Loss 4.3152 LearningRate 0.0012 Epoch: 35 Global Step: 202050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:56,564-Speed 11064.99 samples/sec Loss 4.2090 LearningRate 0.0012 Epoch: 35 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:25:57,477-Speed 11221.36 samples/sec Loss 4.2184 LearningRate 0.0012 Epoch: 35 Global Step: 202070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:58,389-Speed 11234.61 samples/sec Loss 4.1826 LearningRate 0.0012 Epoch: 35 Global Step: 202080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:25:59,358-Speed 10572.19 samples/sec Loss 4.1179 LearningRate 0.0012 Epoch: 35 Global Step: 202090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:00,338-Speed 10456.92 samples/sec Loss 4.3104 LearningRate 0.0012 Epoch: 35 Global Step: 202100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:01,296-Speed 10698.92 samples/sec Loss 4.1646 LearningRate 0.0012 Epoch: 35 Global Step: 202110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:02,248-Speed 10762.18 samples/sec Loss 4.1793 LearningRate 0.0012 Epoch: 35 Global Step: 202120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:03,197-Speed 10801.48 samples/sec Loss 4.2185 LearningRate 0.0012 Epoch: 35 Global Step: 202130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:04,140-Speed 10856.02 samples/sec Loss 4.2237 LearningRate 0.0012 Epoch: 35 Global Step: 202140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:05,061-Speed 11147.68 samples/sec Loss 4.1883 LearningRate 0.0012 Epoch: 35 Global Step: 202150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:05,985-Speed 11084.39 samples/sec Loss 4.2315 LearningRate 0.0012 Epoch: 35 Global Step: 202160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:06,947-Speed 10651.98 samples/sec Loss 4.2632 LearningRate 0.0012 Epoch: 35 Global Step: 202170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:07,938-Speed 10342.46 samples/sec Loss 4.2780 LearningRate 0.0012 Epoch: 35 Global Step: 202180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:08,871-Speed 10977.66 samples/sec Loss 4.1303 LearningRate 0.0012 Epoch: 35 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:09,805-Speed 10974.59 samples/sec Loss 4.2801 LearningRate 0.0012 Epoch: 35 Global Step: 202200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:10,729-Speed 11083.15 samples/sec Loss 4.1182 LearningRate 0.0012 Epoch: 35 Global Step: 202210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:11,706-Speed 10494.13 samples/sec Loss 4.1444 LearningRate 0.0012 Epoch: 35 Global Step: 202220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:12,702-Speed 10281.46 samples/sec Loss 4.1805 LearningRate 0.0012 Epoch: 35 Global Step: 202230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:13,646-Speed 10861.76 samples/sec Loss 4.1109 LearningRate 0.0012 Epoch: 35 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:14,574-Speed 11032.65 samples/sec Loss 4.1642 LearningRate 0.0012 Epoch: 35 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:15,530-Speed 10729.40 samples/sec Loss 4.2351 LearningRate 0.0012 Epoch: 35 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:16,480-Speed 10778.18 samples/sec Loss 4.2327 LearningRate 0.0012 Epoch: 35 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:17,440-Speed 10679.51 samples/sec Loss 4.1874 LearningRate 0.0012 Epoch: 35 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:18,424-Speed 10415.29 samples/sec Loss 4.2692 LearningRate 0.0012 Epoch: 35 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:19,352-Speed 11036.89 samples/sec Loss 4.1715 LearningRate 0.0012 Epoch: 35 Global Step: 202300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:20,303-Speed 10776.73 samples/sec Loss 4.2003 LearningRate 0.0012 Epoch: 35 Global Step: 202310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:21,227-Speed 11096.64 samples/sec Loss 4.2170 LearningRate 0.0012 Epoch: 35 Global Step: 202320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:22,153-Speed 11059.77 samples/sec Loss 4.1511 LearningRate 0.0012 Epoch: 35 Global Step: 202330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:23,066-Speed 11223.70 samples/sec Loss 4.2466 LearningRate 0.0012 Epoch: 35 Global Step: 202340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:24,017-Speed 10772.67 samples/sec Loss 4.1828 LearningRate 0.0012 Epoch: 35 Global Step: 202350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:25,010-Speed 10320.64 samples/sec Loss 4.1963 LearningRate 0.0012 Epoch: 35 Global Step: 202360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:26,011-Speed 10237.70 samples/sec Loss 4.2082 LearningRate 0.0012 Epoch: 35 Global Step: 202370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:26,984-Speed 10542.21 samples/sec Loss 4.1895 LearningRate 0.0012 Epoch: 35 Global Step: 202380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:27,904-Speed 11134.80 samples/sec Loss 4.1768 LearningRate 0.0012 Epoch: 35 Global Step: 202390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:28,845-Speed 10888.32 samples/sec Loss 4.1959 LearningRate 0.0012 Epoch: 35 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:29,815-Speed 10562.71 samples/sec Loss 4.1941 LearningRate 0.0012 Epoch: 35 Global Step: 202410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:30,748-Speed 10983.85 samples/sec Loss 4.1628 LearningRate 0.0012 Epoch: 35 Global Step: 202420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:31,704-Speed 10718.02 samples/sec Loss 4.2772 LearningRate 0.0012 Epoch: 35 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:32,689-Speed 10401.93 samples/sec Loss 4.2093 LearningRate 0.0012 Epoch: 35 Global Step: 202440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:33,589-Speed 11382.40 samples/sec Loss 4.1443 LearningRate 0.0012 Epoch: 35 Global Step: 202450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:34,525-Speed 10947.77 samples/sec Loss 4.2155 LearningRate 0.0012 Epoch: 35 Global Step: 202460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:35,446-Speed 11126.37 samples/sec Loss 4.2072 LearningRate 0.0012 Epoch: 35 Global Step: 202470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:36,393-Speed 10829.97 samples/sec Loss 4.2518 LearningRate 0.0012 Epoch: 35 Global Step: 202480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:37,333-Speed 10901.55 samples/sec Loss 4.2776 LearningRate 0.0012 Epoch: 35 Global Step: 202490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:38,283-Speed 10780.23 samples/sec Loss 4.1027 LearningRate 0.0012 Epoch: 35 Global Step: 202500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:39,235-Speed 10769.27 samples/sec Loss 4.1769 LearningRate 0.0012 Epoch: 35 Global Step: 202510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:40,194-Speed 10681.13 samples/sec Loss 4.0904 LearningRate 0.0012 Epoch: 35 Global Step: 202520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:41,184-Speed 10353.53 samples/sec Loss 4.1570 LearningRate 0.0012 Epoch: 35 Global Step: 202530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:42,215-Speed 9937.52 samples/sec Loss 4.3000 LearningRate 0.0012 Epoch: 35 Global Step: 202540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:43,229-Speed 10106.18 samples/sec Loss 4.1562 LearningRate 0.0012 Epoch: 35 Global Step: 202550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:44,195-Speed 10609.06 samples/sec Loss 4.2388 LearningRate 0.0012 Epoch: 35 Global Step: 202560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:45,118-Speed 11106.08 samples/sec Loss 4.1896 LearningRate 0.0012 Epoch: 35 Global Step: 202570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:46,025-Speed 11300.09 samples/sec Loss 4.1498 LearningRate 0.0012 Epoch: 35 Global Step: 202580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:46,941-Speed 11186.48 samples/sec Loss 4.1195 LearningRate 0.0012 Epoch: 35 Global Step: 202590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:47,870-Speed 11020.47 samples/sec Loss 4.1571 LearningRate 0.0012 Epoch: 35 Global Step: 202600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:48,846-Speed 10506.25 samples/sec Loss 4.1692 LearningRate 0.0012 Epoch: 35 Global Step: 202610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:49,796-Speed 10779.62 samples/sec Loss 4.2050 LearningRate 0.0012 Epoch: 35 Global Step: 202620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:50,803-Speed 10176.13 samples/sec Loss 4.1781 LearningRate 0.0012 Epoch: 35 Global Step: 202630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:51,807-Speed 10210.26 samples/sec Loss 4.1965 LearningRate 0.0012 Epoch: 35 Global Step: 202640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:26:52,817-Speed 10139.55 samples/sec Loss 4.2257 LearningRate 0.0012 Epoch: 35 Global Step: 202650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:26:53,787-Speed 10571.99 samples/sec Loss 4.1912 LearningRate 0.0012 Epoch: 35 Global Step: 202660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:26:54,737-Speed 10786.26 samples/sec Loss 4.1662 LearningRate 0.0012 Epoch: 35 Global Step: 202670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:26:55,719-Speed 10437.17 samples/sec Loss 4.0894 LearningRate 0.0012 Epoch: 35 Global Step: 202680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:26:56,639-Speed 11135.22 samples/sec Loss 4.1906 LearningRate 0.0012 Epoch: 35 Global Step: 202690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:26:57,584-Speed 10841.67 samples/sec Loss 4.2543 LearningRate 0.0012 Epoch: 35 Global Step: 202700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:26:58,558-Speed 10517.99 samples/sec Loss 4.1861 LearningRate 0.0012 Epoch: 35 Global Step: 202710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:26:59,565-Speed 10179.74 samples/sec Loss 4.2374 LearningRate 0.0012 Epoch: 35 Global Step: 202720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:27:00,511-Speed 10830.12 samples/sec Loss 4.0601 LearningRate 0.0012 Epoch: 35 Global Step: 202730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:27:01,508-Speed 10281.02 samples/sec Loss 4.1336 LearningRate 0.0012 Epoch: 35 Global Step: 202740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:27:02,483-Speed 10505.71 samples/sec Loss 4.1781 LearningRate 0.0012 Epoch: 35 Global Step: 202750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:27:03,452-Speed 10581.29 samples/sec Loss 4.1921 LearningRate 0.0012 Epoch: 35 Global Step: 202760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:04,395-Speed 10857.82 samples/sec Loss 4.2458 LearningRate 0.0012 Epoch: 35 Global Step: 202770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:05,364-Speed 10583.60 samples/sec Loss 4.2311 LearningRate 0.0012 Epoch: 35 Global Step: 202780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:06,310-Speed 10834.45 samples/sec Loss 4.1002 LearningRate 0.0012 Epoch: 35 Global Step: 202790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:07,256-Speed 10832.24 samples/sec Loss 4.2166 LearningRate 0.0012 Epoch: 35 Global Step: 202800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:08,284-Speed 9959.18 samples/sec Loss 4.2744 LearningRate 0.0012 Epoch: 35 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:09,241-Speed 10712.27 samples/sec Loss 4.2027 LearningRate 0.0012 Epoch: 35 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:10,153-Speed 11230.04 samples/sec Loss 4.1795 LearningRate 0.0012 Epoch: 35 Global Step: 202830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:11,077-Speed 11101.74 samples/sec Loss 4.2093 LearningRate 0.0012 Epoch: 35 Global Step: 202840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:12,031-Speed 10733.06 samples/sec Loss 4.1879 LearningRate 0.0012 Epoch: 35 Global Step: 202850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:12,968-Speed 10940.59 samples/sec Loss 4.2476 LearningRate 0.0012 Epoch: 35 Global Step: 202860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:13,957-Speed 10363.38 samples/sec Loss 4.1617 LearningRate 0.0012 Epoch: 35 Global Step: 202870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:14,989-Speed 9927.84 samples/sec Loss 4.1887 LearningRate 0.0012 Epoch: 35 Global Step: 202880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:15,954-Speed 10613.83 samples/sec Loss 4.1317 LearningRate 0.0012 Epoch: 35 Global Step: 202890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:16,893-Speed 10913.40 samples/sec Loss 4.0990 LearningRate 0.0012 Epoch: 35 Global Step: 202900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:17,827-Speed 10974.92 samples/sec Loss 4.1469 LearningRate 0.0012 Epoch: 35 Global Step: 202910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:18,762-Speed 10958.70 samples/sec Loss 4.2036 LearningRate 0.0012 Epoch: 35 Global Step: 202920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:19,708-Speed 10823.23 samples/sec Loss 4.2597 LearningRate 0.0012 Epoch: 35 Global Step: 202930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:20,685-Speed 10488.95 samples/sec Loss 4.2225 LearningRate 0.0012 Epoch: 35 Global Step: 202940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:21,620-Speed 10967.18 samples/sec Loss 4.1485 LearningRate 0.0012 Epoch: 35 Global Step: 202950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:22,530-Speed 11258.95 samples/sec Loss 4.0598 LearningRate 0.0012 Epoch: 35 Global Step: 202960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:23,443-Speed 11230.89 samples/sec Loss 4.1618 LearningRate 0.0012 Epoch: 35 Global Step: 202970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:24,405-Speed 10644.20 samples/sec Loss 4.1672 LearningRate 0.0012 Epoch: 35 Global Step: 202980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:25,366-Speed 10664.51 samples/sec Loss 4.1263 LearningRate 0.0012 Epoch: 35 Global Step: 202990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:26,311-Speed 10848.84 samples/sec Loss 4.1767 LearningRate 0.0012 Epoch: 35 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:27,237-Speed 11064.49 samples/sec Loss 4.2269 LearningRate 0.0012 Epoch: 35 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:28,208-Speed 10546.57 samples/sec Loss 4.0259 LearningRate 0.0012 Epoch: 35 Global Step: 203020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:29,173-Speed 10619.69 samples/sec Loss 4.2101 LearningRate 0.0012 Epoch: 35 Global Step: 203030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:30,157-Speed 10416.73 samples/sec Loss 4.1452 LearningRate 0.0012 Epoch: 35 Global Step: 203040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:31,116-Speed 10687.36 samples/sec Loss 4.2207 LearningRate 0.0011 Epoch: 35 Global Step: 203050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:32,068-Speed 10765.24 samples/sec Loss 4.1633 LearningRate 0.0011 Epoch: 35 Global Step: 203060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:32,987-Speed 11146.81 samples/sec Loss 4.0993 LearningRate 0.0011 Epoch: 35 Global Step: 203070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:33,924-Speed 10934.72 samples/sec Loss 4.2841 LearningRate 0.0011 Epoch: 35 Global Step: 203080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:34,856-Speed 10996.35 samples/sec Loss 4.3027 LearningRate 0.0011 Epoch: 35 Global Step: 203090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:35,771-Speed 11214.46 samples/sec Loss 4.1676 LearningRate 0.0011 Epoch: 35 Global Step: 203100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:36,681-Speed 11258.89 samples/sec Loss 4.2785 LearningRate 0.0011 Epoch: 35 Global Step: 203110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:37,629-Speed 10804.10 samples/sec Loss 4.1684 LearningRate 0.0011 Epoch: 35 Global Step: 203120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:38,565-Speed 10952.10 samples/sec Loss 4.1503 LearningRate 0.0011 Epoch: 35 Global Step: 203130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:39,483-Speed 11158.83 samples/sec Loss 4.2488 LearningRate 0.0011 Epoch: 35 Global Step: 203140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:27:40,456-Speed 10534.94 samples/sec Loss 4.1426 LearningRate 0.0011 Epoch: 35 Global Step: 203150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:41,405-Speed 10799.55 samples/sec Loss 4.1286 LearningRate 0.0011 Epoch: 35 Global Step: 203160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:42,383-Speed 10466.45 samples/sec Loss 4.1869 LearningRate 0.0011 Epoch: 35 Global Step: 203170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:43,356-Speed 10536.26 samples/sec Loss 4.2156 LearningRate 0.0011 Epoch: 35 Global Step: 203180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:44,314-Speed 10695.67 samples/sec Loss 4.2241 LearningRate 0.0011 Epoch: 35 Global Step: 203190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:45,223-Speed 11276.34 samples/sec Loss 4.2140 LearningRate 0.0011 Epoch: 35 Global Step: 203200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:46,131-Speed 11292.46 samples/sec Loss 4.3386 LearningRate 0.0011 Epoch: 35 Global Step: 203210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:47,045-Speed 11203.20 samples/sec Loss 4.1117 LearningRate 0.0011 Epoch: 35 Global Step: 203220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:47,995-Speed 10793.74 samples/sec Loss 4.2070 LearningRate 0.0011 Epoch: 35 Global Step: 203230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:48,965-Speed 10561.41 samples/sec Loss 4.3511 LearningRate 0.0011 Epoch: 35 Global Step: 203240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:49,919-Speed 10742.18 samples/sec Loss 4.2402 LearningRate 0.0011 Epoch: 35 Global Step: 203250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:50,911-Speed 10330.35 samples/sec Loss 4.0944 LearningRate 0.0011 Epoch: 35 Global Step: 203260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:51,892-Speed 10442.67 samples/sec Loss 4.1585 LearningRate 0.0011 Epoch: 35 Global Step: 203270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:52,832-Speed 10904.21 samples/sec Loss 4.1556 LearningRate 0.0011 Epoch: 35 Global Step: 203280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:53,825-Speed 10318.07 samples/sec Loss 4.3371 LearningRate 0.0011 Epoch: 35 Global Step: 203290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:54,829-Speed 10201.52 samples/sec Loss 4.2074 LearningRate 0.0011 Epoch: 35 Global Step: 203300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:55,793-Speed 10637.19 samples/sec Loss 4.3517 LearningRate 0.0011 Epoch: 35 Global Step: 203310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:56,720-Speed 11049.47 samples/sec Loss 4.3094 LearningRate 0.0011 Epoch: 35 Global Step: 203320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:57,655-Speed 10959.12 samples/sec Loss 4.2890 LearningRate 0.0011 Epoch: 35 Global Step: 203330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:58,577-Speed 11117.89 samples/sec Loss 4.1393 LearningRate 0.0011 Epoch: 35 Global Step: 203340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:27:59,549-Speed 10541.15 samples/sec Loss 4.2144 LearningRate 0.0011 Epoch: 35 Global Step: 203350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:00,471-Speed 11146.12 samples/sec Loss 4.1813 LearningRate 0.0011 Epoch: 35 Global Step: 203360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:01,424-Speed 10748.92 samples/sec Loss 4.2542 LearningRate 0.0011 Epoch: 35 Global Step: 203370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:02,390-Speed 10603.39 samples/sec Loss 4.1314 LearningRate 0.0011 Epoch: 35 Global Step: 203380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:03,325-Speed 10964.23 samples/sec Loss 4.2237 LearningRate 0.0011 Epoch: 35 Global Step: 203390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:04,268-Speed 10863.37 samples/sec Loss 4.2163 LearningRate 0.0011 Epoch: 35 Global Step: 203400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:05,218-Speed 10793.73 samples/sec Loss 4.3056 LearningRate 0.0011 Epoch: 35 Global Step: 203410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:06,132-Speed 11210.21 samples/sec Loss 4.1489 LearningRate 0.0011 Epoch: 35 Global Step: 203420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:07,114-Speed 10431.65 samples/sec Loss 4.1673 LearningRate 0.0011 Epoch: 35 Global Step: 203430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:08,030-Speed 11194.72 samples/sec Loss 4.3296 LearningRate 0.0011 Epoch: 35 Global Step: 203440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:08,955-Speed 11080.29 samples/sec Loss 4.2347 LearningRate 0.0011 Epoch: 35 Global Step: 203450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:09,871-Speed 11181.03 samples/sec Loss 4.2070 LearningRate 0.0011 Epoch: 35 Global Step: 203460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:10,794-Speed 11106.54 samples/sec Loss 4.3033 LearningRate 0.0011 Epoch: 35 Global Step: 203470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:11,701-Speed 11290.54 samples/sec Loss 4.1702 LearningRate 0.0011 Epoch: 35 Global Step: 203480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:12,645-Speed 10859.29 samples/sec Loss 4.2111 LearningRate 0.0011 Epoch: 35 Global Step: 203490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:13,620-Speed 10508.74 samples/sec Loss 4.2281 LearningRate 0.0011 Epoch: 35 Global Step: 203500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:14,553-Speed 10978.92 samples/sec Loss 4.2452 LearningRate 0.0011 Epoch: 35 Global Step: 203510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:15,553-Speed 10253.46 samples/sec Loss 4.1892 LearningRate 0.0011 Epoch: 35 Global Step: 203520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:16,484-Speed 11000.31 samples/sec Loss 4.2832 LearningRate 0.0011 Epoch: 35 Global Step: 203530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:17,394-Speed 11268.52 samples/sec Loss 4.1909 LearningRate 0.0011 Epoch: 35 Global Step: 203540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:18,343-Speed 10788.90 samples/sec Loss 4.1339 LearningRate 0.0011 Epoch: 35 Global Step: 203550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:19,287-Speed 10861.79 samples/sec Loss 4.2870 LearningRate 0.0011 Epoch: 35 Global Step: 203560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:20,246-Speed 10688.43 samples/sec Loss 4.3203 LearningRate 0.0011 Epoch: 35 Global Step: 203570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:21,179-Speed 10990.59 samples/sec Loss 4.1871 LearningRate 0.0011 Epoch: 35 Global Step: 203580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:22,099-Speed 11135.53 samples/sec Loss 4.2301 LearningRate 0.0011 Epoch: 35 Global Step: 203590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:23,037-Speed 10921.00 samples/sec Loss 4.2576 LearningRate 0.0011 Epoch: 35 Global Step: 203600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:23,993-Speed 10718.99 samples/sec Loss 4.1396 LearningRate 0.0011 Epoch: 35 Global Step: 203610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:24,944-Speed 10778.61 samples/sec Loss 4.2266 LearningRate 0.0011 Epoch: 35 Global Step: 203620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:25,922-Speed 10480.58 samples/sec Loss 4.2605 LearningRate 0.0011 Epoch: 35 Global Step: 203630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:26,886-Speed 10627.53 samples/sec Loss 4.3643 LearningRate 0.0011 Epoch: 35 Global Step: 203640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:27,859-Speed 10536.19 samples/sec Loss 4.1893 LearningRate 0.0011 Epoch: 35 Global Step: 203650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:28,826-Speed 10596.99 samples/sec Loss 4.0860 LearningRate 0.0011 Epoch: 35 Global Step: 203660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:29,804-Speed 10475.02 samples/sec Loss 4.3797 LearningRate 0.0011 Epoch: 35 Global Step: 203670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:30,751-Speed 10826.51 samples/sec Loss 4.2003 LearningRate 0.0011 Epoch: 35 Global Step: 203680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:31,711-Speed 10672.74 samples/sec Loss 4.2807 LearningRate 0.0011 Epoch: 35 Global Step: 203690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:32,647-Speed 10946.28 samples/sec Loss 4.3205 LearningRate 0.0011 Epoch: 35 Global Step: 203700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:33,574-Speed 11049.30 samples/sec Loss 4.2765 LearningRate 0.0011 Epoch: 35 Global Step: 203710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:34,494-Speed 11142.69 samples/sec Loss 4.0628 LearningRate 0.0011 Epoch: 35 Global Step: 203720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:35,426-Speed 10995.84 samples/sec Loss 4.2016 LearningRate 0.0011 Epoch: 35 Global Step: 203730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:36,333-Speed 11291.20 samples/sec Loss 4.2738 LearningRate 0.0011 Epoch: 35 Global Step: 203740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:37,325-Speed 10331.08 samples/sec Loss 4.1965 LearningRate 0.0011 Epoch: 35 Global Step: 203750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:38,346-Speed 10037.22 samples/sec Loss 4.1252 LearningRate 0.0011 Epoch: 35 Global Step: 203760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:39,295-Speed 10802.29 samples/sec Loss 4.1325 LearningRate 0.0011 Epoch: 35 Global Step: 203770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:40,283-Speed 10370.40 samples/sec Loss 4.2897 LearningRate 0.0011 Epoch: 35 Global Step: 203780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:41,241-Speed 10694.58 samples/sec Loss 4.2701 LearningRate 0.0011 Epoch: 35 Global Step: 203790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:42,203-Speed 10650.12 samples/sec Loss 4.1791 LearningRate 0.0011 Epoch: 35 Global Step: 203800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:43,188-Speed 10400.95 samples/sec Loss 4.2388 LearningRate 0.0011 Epoch: 35 Global Step: 203810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:44,178-Speed 10348.01 samples/sec Loss 4.1523 LearningRate 0.0011 Epoch: 35 Global Step: 203820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:45,111-Speed 10980.24 samples/sec Loss 4.1813 LearningRate 0.0011 Epoch: 35 Global Step: 203830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:28:46,003-Speed 11500.08 samples/sec Loss 4.2498 LearningRate 0.0011 Epoch: 35 Global Step: 203840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:46,935-Speed 10988.74 samples/sec Loss 4.2352 LearningRate 0.0011 Epoch: 35 Global Step: 203850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:47,863-Speed 11037.56 samples/sec Loss 4.1926 LearningRate 0.0011 Epoch: 35 Global Step: 203860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:48,825-Speed 10657.06 samples/sec Loss 4.2314 LearningRate 0.0011 Epoch: 35 Global Step: 203870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:49,785-Speed 10677.74 samples/sec Loss 4.1975 LearningRate 0.0011 Epoch: 35 Global Step: 203880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:28:50,724-Speed 10918.62 samples/sec Loss 4.2165 LearningRate 0.0011 Epoch: 35 Global Step: 203890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:51,669-Speed 10841.29 samples/sec Loss 4.1095 LearningRate 0.0011 Epoch: 35 Global Step: 203900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:52,643-Speed 10523.12 samples/sec Loss 4.2868 LearningRate 0.0011 Epoch: 35 Global Step: 203910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:53,632-Speed 10359.41 samples/sec Loss 4.2092 LearningRate 0.0011 Epoch: 35 Global Step: 203920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:54,567-Speed 10958.54 samples/sec Loss 4.2495 LearningRate 0.0011 Epoch: 35 Global Step: 203930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:55,527-Speed 10675.25 samples/sec Loss 4.1152 LearningRate 0.0011 Epoch: 35 Global Step: 203940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:56,460-Speed 10977.77 samples/sec Loss 4.2181 LearningRate 0.0011 Epoch: 35 Global Step: 203950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:57,409-Speed 10802.24 samples/sec Loss 4.1818 LearningRate 0.0011 Epoch: 35 Global Step: 203960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:58,343-Speed 10963.60 samples/sec Loss 4.2090 LearningRate 0.0011 Epoch: 35 Global Step: 203970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:28:59,300-Speed 10707.34 samples/sec Loss 4.2877 LearningRate 0.0011 Epoch: 35 Global Step: 203980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:29:00,295-Speed 10306.92 samples/sec Loss 4.2207 LearningRate 0.0011 Epoch: 35 Global Step: 203990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:29:01,259-Speed 10620.48 samples/sec Loss 4.1910 LearningRate 0.0011 Epoch: 35 Global Step: 204000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:29:22,526-[lfw][204000]XNorm: 7.676033 Training: 2022-04-11 06:29:22,527-[lfw][204000]Accuracy-Flip: 0.99717+-0.00248 Training: 2022-04-11 06:29:22,527-[lfw][204000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:29:47,212-[cfp_fp][204000]XNorm: 6.579453 Training: 2022-04-11 06:29:47,212-[cfp_fp][204000]Accuracy-Flip: 0.95700+-0.01055 Training: 2022-04-11 06:29:47,213-[cfp_fp][204000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:30:08,473-[agedb_30][204000]XNorm: 7.527732 Training: 2022-04-11 06:30:08,473-[agedb_30][204000]Accuracy-Flip: 0.97033+-0.00686 Training: 2022-04-11 06:30:08,474-[agedb_30][204000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:30:09,397-Speed 150.28 samples/sec Loss 4.0876 LearningRate 0.0011 Epoch: 35 Global Step: 204010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:10,332-Speed 10961.21 samples/sec Loss 4.2560 LearningRate 0.0011 Epoch: 35 Global Step: 204020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:11,277-Speed 10845.11 samples/sec Loss 4.2112 LearningRate 0.0011 Epoch: 35 Global Step: 204030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:12,219-Speed 10877.73 samples/sec Loss 4.1691 LearningRate 0.0011 Epoch: 35 Global Step: 204040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:13,152-Speed 10986.37 samples/sec Loss 4.3029 LearningRate 0.0011 Epoch: 35 Global Step: 204050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:14,097-Speed 10841.88 samples/sec Loss 4.1795 LearningRate 0.0011 Epoch: 35 Global Step: 204060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:15,052-Speed 10719.13 samples/sec Loss 4.3101 LearningRate 0.0011 Epoch: 35 Global Step: 204070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:16,020-Speed 10586.24 samples/sec Loss 4.3388 LearningRate 0.0011 Epoch: 35 Global Step: 204080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:16,943-Speed 11105.72 samples/sec Loss 4.2397 LearningRate 0.0011 Epoch: 35 Global Step: 204090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:17,919-Speed 10503.90 samples/sec Loss 4.2800 LearningRate 0.0011 Epoch: 35 Global Step: 204100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:18,860-Speed 10884.47 samples/sec Loss 4.1714 LearningRate 0.0011 Epoch: 35 Global Step: 204110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:19,869-Speed 10158.91 samples/sec Loss 4.3221 LearningRate 0.0011 Epoch: 35 Global Step: 204120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:20,809-Speed 10899.86 samples/sec Loss 4.3181 LearningRate 0.0011 Epoch: 35 Global Step: 204130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:21,733-Speed 11084.76 samples/sec Loss 4.2918 LearningRate 0.0010 Epoch: 35 Global Step: 204140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:22,636-Speed 11345.88 samples/sec Loss 4.1511 LearningRate 0.0010 Epoch: 35 Global Step: 204150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:23,566-Speed 11022.24 samples/sec Loss 4.2272 LearningRate 0.0010 Epoch: 35 Global Step: 204160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:24,501-Speed 10964.88 samples/sec Loss 4.2256 LearningRate 0.0010 Epoch: 35 Global Step: 204170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:25,436-Speed 10958.43 samples/sec Loss 4.2538 LearningRate 0.0010 Epoch: 35 Global Step: 204180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:26,364-Speed 11036.73 samples/sec Loss 4.2037 LearningRate 0.0010 Epoch: 35 Global Step: 204190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:27,302-Speed 10929.06 samples/sec Loss 4.1783 LearningRate 0.0010 Epoch: 35 Global Step: 204200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:28,265-Speed 10639.65 samples/sec Loss 4.4083 LearningRate 0.0010 Epoch: 35 Global Step: 204210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:29,219-Speed 10735.10 samples/sec Loss 4.2351 LearningRate 0.0010 Epoch: 35 Global Step: 204220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:30,242-Speed 10022.37 samples/sec Loss 4.2994 LearningRate 0.0010 Epoch: 35 Global Step: 204230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:31,193-Speed 10774.26 samples/sec Loss 4.2597 LearningRate 0.0010 Epoch: 35 Global Step: 204240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:32,127-Speed 10960.40 samples/sec Loss 4.1454 LearningRate 0.0010 Epoch: 35 Global Step: 204250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:33,067-Speed 10917.30 samples/sec Loss 4.2092 LearningRate 0.0010 Epoch: 35 Global Step: 204260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:33,988-Speed 11125.64 samples/sec Loss 4.1955 LearningRate 0.0010 Epoch: 35 Global Step: 204270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:34,890-Speed 11360.06 samples/sec Loss 4.3034 LearningRate 0.0010 Epoch: 35 Global Step: 204280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:35,872-Speed 10432.87 samples/sec Loss 4.2110 LearningRate 0.0010 Epoch: 35 Global Step: 204290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:36,826-Speed 10734.12 samples/sec Loss 4.3440 LearningRate 0.0010 Epoch: 35 Global Step: 204300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:37,810-Speed 10420.93 samples/sec Loss 4.3562 LearningRate 0.0010 Epoch: 35 Global Step: 204310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:38,781-Speed 10552.17 samples/sec Loss 4.0715 LearningRate 0.0010 Epoch: 35 Global Step: 204320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:39,772-Speed 10334.82 samples/sec Loss 4.1154 LearningRate 0.0010 Epoch: 35 Global Step: 204330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:40,750-Speed 10484.61 samples/sec Loss 4.2369 LearningRate 0.0010 Epoch: 35 Global Step: 204340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:41,691-Speed 10879.45 samples/sec Loss 4.1930 LearningRate 0.0010 Epoch: 35 Global Step: 204350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:42,647-Speed 10724.96 samples/sec Loss 4.2014 LearningRate 0.0010 Epoch: 35 Global Step: 204360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:43,638-Speed 10342.74 samples/sec Loss 4.1734 LearningRate 0.0010 Epoch: 35 Global Step: 204370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:44,578-Speed 10897.29 samples/sec Loss 4.2355 LearningRate 0.0010 Epoch: 35 Global Step: 204380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:45,516-Speed 10917.99 samples/sec Loss 4.1465 LearningRate 0.0010 Epoch: 35 Global Step: 204390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:46,451-Speed 10967.01 samples/sec Loss 4.1773 LearningRate 0.0010 Epoch: 35 Global Step: 204400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:47,418-Speed 10596.50 samples/sec Loss 4.1744 LearningRate 0.0010 Epoch: 35 Global Step: 204410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:48,360-Speed 10876.76 samples/sec Loss 4.3351 LearningRate 0.0010 Epoch: 35 Global Step: 204420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:49,316-Speed 10714.47 samples/sec Loss 4.2031 LearningRate 0.0010 Epoch: 35 Global Step: 204430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:50,279-Speed 10642.31 samples/sec Loss 4.1569 LearningRate 0.0010 Epoch: 35 Global Step: 204440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:51,238-Speed 10689.65 samples/sec Loss 4.1880 LearningRate 0.0010 Epoch: 35 Global Step: 204450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:52,207-Speed 10575.13 samples/sec Loss 4.1386 LearningRate 0.0010 Epoch: 35 Global Step: 204460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:53,157-Speed 10782.80 samples/sec Loss 4.1359 LearningRate 0.0010 Epoch: 35 Global Step: 204470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:54,117-Speed 10676.77 samples/sec Loss 4.2280 LearningRate 0.0010 Epoch: 35 Global Step: 204480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:55,050-Speed 10987.70 samples/sec Loss 4.1593 LearningRate 0.0010 Epoch: 35 Global Step: 204490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:56,026-Speed 10496.31 samples/sec Loss 4.2022 LearningRate 0.0010 Epoch: 35 Global Step: 204500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:56,987-Speed 10654.18 samples/sec Loss 4.2286 LearningRate 0.0010 Epoch: 35 Global Step: 204510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:30:57,935-Speed 10821.12 samples/sec Loss 4.2476 LearningRate 0.0010 Epoch: 35 Global Step: 204520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:58,914-Speed 10464.03 samples/sec Loss 4.1857 LearningRate 0.0010 Epoch: 35 Global Step: 204530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:30:59,884-Speed 10558.64 samples/sec Loss 4.2013 LearningRate 0.0010 Epoch: 35 Global Step: 204540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:31:00,842-Speed 10697.85 samples/sec Loss 4.2918 LearningRate 0.0010 Epoch: 35 Global Step: 204550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:31:01,793-Speed 10779.87 samples/sec Loss 4.2154 LearningRate 0.0010 Epoch: 35 Global Step: 204560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:31:02,734-Speed 10891.85 samples/sec Loss 4.2918 LearningRate 0.0010 Epoch: 35 Global Step: 204570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:31:03,700-Speed 10608.38 samples/sec Loss 4.2073 LearningRate 0.0010 Epoch: 35 Global Step: 204580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:04,687-Speed 10380.43 samples/sec Loss 4.1886 LearningRate 0.0010 Epoch: 35 Global Step: 204590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:05,627-Speed 10892.12 samples/sec Loss 4.1307 LearningRate 0.0010 Epoch: 35 Global Step: 204600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:06,577-Speed 10791.44 samples/sec Loss 4.2011 LearningRate 0.0010 Epoch: 35 Global Step: 204610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:07,525-Speed 10813.96 samples/sec Loss 4.3452 LearningRate 0.0010 Epoch: 35 Global Step: 204620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:08,533-Speed 10158.16 samples/sec Loss 4.2549 LearningRate 0.0010 Epoch: 35 Global Step: 204630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:09,476-Speed 10872.43 samples/sec Loss 4.2362 LearningRate 0.0010 Epoch: 35 Global Step: 204640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:10,400-Speed 11079.25 samples/sec Loss 4.2784 LearningRate 0.0010 Epoch: 35 Global Step: 204650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:11,381-Speed 10448.92 samples/sec Loss 4.3149 LearningRate 0.0010 Epoch: 35 Global Step: 204660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:12,338-Speed 10708.27 samples/sec Loss 4.3118 LearningRate 0.0010 Epoch: 35 Global Step: 204670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:13,295-Speed 10708.64 samples/sec Loss 4.3470 LearningRate 0.0010 Epoch: 35 Global Step: 204680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:14,306-Speed 10140.52 samples/sec Loss 4.3814 LearningRate 0.0010 Epoch: 35 Global Step: 204690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:25,698-Speed 898.97 samples/sec Loss 4.0461 LearningRate 0.0010 Epoch: 36 Global Step: 204700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:26,679-Speed 10447.55 samples/sec Loss 3.9459 LearningRate 0.0010 Epoch: 36 Global Step: 204710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:27,689-Speed 10145.24 samples/sec Loss 4.1374 LearningRate 0.0010 Epoch: 36 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:28,671-Speed 10427.34 samples/sec Loss 4.0963 LearningRate 0.0010 Epoch: 36 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:29,691-Speed 10054.01 samples/sec Loss 4.0318 LearningRate 0.0010 Epoch: 36 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:30,924-Speed 8305.28 samples/sec Loss 3.9681 LearningRate 0.0010 Epoch: 36 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:31,935-Speed 10131.56 samples/sec Loss 4.0913 LearningRate 0.0010 Epoch: 36 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:32,859-Speed 11095.58 samples/sec Loss 3.9681 LearningRate 0.0010 Epoch: 36 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:33,825-Speed 10603.77 samples/sec Loss 4.0531 LearningRate 0.0010 Epoch: 36 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:34,747-Speed 11114.31 samples/sec Loss 4.1295 LearningRate 0.0010 Epoch: 36 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:35,700-Speed 10757.72 samples/sec Loss 3.9690 LearningRate 0.0010 Epoch: 36 Global Step: 204800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:36,645-Speed 10842.20 samples/sec Loss 4.0860 LearningRate 0.0010 Epoch: 36 Global Step: 204810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:37,626-Speed 10447.77 samples/sec Loss 4.0816 LearningRate 0.0010 Epoch: 36 Global Step: 204820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:38,626-Speed 10240.65 samples/sec Loss 4.0369 LearningRate 0.0010 Epoch: 36 Global Step: 204830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:39,627-Speed 10233.17 samples/sec Loss 4.1301 LearningRate 0.0010 Epoch: 36 Global Step: 204840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:40,623-Speed 10291.11 samples/sec Loss 3.9841 LearningRate 0.0010 Epoch: 36 Global Step: 204850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:41,648-Speed 10002.37 samples/sec Loss 4.0559 LearningRate 0.0010 Epoch: 36 Global Step: 204860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:42,594-Speed 10828.82 samples/sec Loss 4.1211 LearningRate 0.0010 Epoch: 36 Global Step: 204870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:43,544-Speed 10783.11 samples/sec Loss 4.0623 LearningRate 0.0010 Epoch: 36 Global Step: 204880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:44,524-Speed 10463.78 samples/sec Loss 3.9543 LearningRate 0.0010 Epoch: 36 Global Step: 204890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:45,454-Speed 11017.24 samples/sec Loss 4.0904 LearningRate 0.0010 Epoch: 36 Global Step: 204900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:46,368-Speed 11214.05 samples/sec Loss 4.0243 LearningRate 0.0010 Epoch: 36 Global Step: 204910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:31:47,287-Speed 11145.53 samples/sec Loss 4.0068 LearningRate 0.0010 Epoch: 36 Global Step: 204920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:48,270-Speed 10428.10 samples/sec Loss 4.0848 LearningRate 0.0010 Epoch: 36 Global Step: 204930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:49,214-Speed 10852.94 samples/sec Loss 3.8998 LearningRate 0.0010 Epoch: 36 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:50,182-Speed 10588.96 samples/sec Loss 3.9857 LearningRate 0.0010 Epoch: 36 Global Step: 204950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:51,135-Speed 10748.99 samples/sec Loss 4.0932 LearningRate 0.0010 Epoch: 36 Global Step: 204960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:52,099-Speed 10629.08 samples/sec Loss 4.1135 LearningRate 0.0010 Epoch: 36 Global Step: 204970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:31:53,029-Speed 11015.89 samples/sec Loss 3.9885 LearningRate 0.0010 Epoch: 36 Global Step: 204980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:53,944-Speed 11197.12 samples/sec Loss 4.0048 LearningRate 0.0010 Epoch: 36 Global Step: 204990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:54,904-Speed 10673.31 samples/sec Loss 4.0971 LearningRate 0.0010 Epoch: 36 Global Step: 205000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:55,858-Speed 10746.11 samples/sec Loss 4.1192 LearningRate 0.0010 Epoch: 36 Global Step: 205010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:56,778-Speed 11133.27 samples/sec Loss 4.0690 LearningRate 0.0010 Epoch: 36 Global Step: 205020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:57,679-Speed 11377.62 samples/sec Loss 4.0788 LearningRate 0.0010 Epoch: 36 Global Step: 205030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:58,590-Speed 11254.70 samples/sec Loss 4.0778 LearningRate 0.0010 Epoch: 36 Global Step: 205040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:31:59,535-Speed 10842.36 samples/sec Loss 3.9484 LearningRate 0.0010 Epoch: 36 Global Step: 205050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:00,464-Speed 11022.23 samples/sec Loss 4.0277 LearningRate 0.0010 Epoch: 36 Global Step: 205060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:01,399-Speed 10961.90 samples/sec Loss 4.0613 LearningRate 0.0010 Epoch: 36 Global Step: 205070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:02,372-Speed 10527.65 samples/sec Loss 4.0065 LearningRate 0.0010 Epoch: 36 Global Step: 205080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:03,311-Speed 10915.72 samples/sec Loss 4.0222 LearningRate 0.0010 Epoch: 36 Global Step: 205090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:04,245-Speed 10979.41 samples/sec Loss 4.0492 LearningRate 0.0010 Epoch: 36 Global Step: 205100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:05,256-Speed 10135.28 samples/sec Loss 4.0605 LearningRate 0.0010 Epoch: 36 Global Step: 205110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:06,206-Speed 10783.21 samples/sec Loss 3.9899 LearningRate 0.0010 Epoch: 36 Global Step: 205120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:07,160-Speed 10745.58 samples/sec Loss 3.9884 LearningRate 0.0010 Epoch: 36 Global Step: 205130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:08,123-Speed 10639.47 samples/sec Loss 4.0179 LearningRate 0.0010 Epoch: 36 Global Step: 205140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:09,109-Speed 10390.45 samples/sec Loss 4.1006 LearningRate 0.0010 Epoch: 36 Global Step: 205150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:10,033-Speed 11095.01 samples/sec Loss 4.0276 LearningRate 0.0010 Epoch: 36 Global Step: 205160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:10,954-Speed 11117.83 samples/sec Loss 4.0636 LearningRate 0.0010 Epoch: 36 Global Step: 205170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:11,889-Speed 10957.80 samples/sec Loss 4.1202 LearningRate 0.0010 Epoch: 36 Global Step: 205180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:12,877-Speed 10376.04 samples/sec Loss 4.0962 LearningRate 0.0010 Epoch: 36 Global Step: 205190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:13,851-Speed 10520.30 samples/sec Loss 4.1001 LearningRate 0.0010 Epoch: 36 Global Step: 205200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:14,848-Speed 10278.04 samples/sec Loss 3.9735 LearningRate 0.0010 Epoch: 36 Global Step: 205210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:15,802-Speed 10742.06 samples/sec Loss 4.0308 LearningRate 0.0010 Epoch: 36 Global Step: 205220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:16,732-Speed 11015.61 samples/sec Loss 4.1190 LearningRate 0.0010 Epoch: 36 Global Step: 205230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:17,634-Speed 11363.34 samples/sec Loss 3.9379 LearningRate 0.0010 Epoch: 36 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:18,662-Speed 9968.23 samples/sec Loss 4.0181 LearningRate 0.0010 Epoch: 36 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:19,784-Speed 9129.48 samples/sec Loss 4.1390 LearningRate 0.0010 Epoch: 36 Global Step: 205260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:20,716-Speed 10992.17 samples/sec Loss 3.9559 LearningRate 0.0010 Epoch: 36 Global Step: 205270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:21,733-Speed 10071.43 samples/sec Loss 4.0519 LearningRate 0.0009 Epoch: 36 Global Step: 205280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:22,686-Speed 10755.39 samples/sec Loss 4.0570 LearningRate 0.0009 Epoch: 36 Global Step: 205290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:23,594-Speed 11296.63 samples/sec Loss 4.0682 LearningRate 0.0009 Epoch: 36 Global Step: 205300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:24,572-Speed 10477.00 samples/sec Loss 4.0515 LearningRate 0.0009 Epoch: 36 Global Step: 205310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:25,487-Speed 11200.38 samples/sec Loss 3.9954 LearningRate 0.0009 Epoch: 36 Global Step: 205320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:26,514-Speed 9971.29 samples/sec Loss 4.0892 LearningRate 0.0009 Epoch: 36 Global Step: 205330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:27,415-Speed 11374.94 samples/sec Loss 4.0858 LearningRate 0.0009 Epoch: 36 Global Step: 205340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:28,375-Speed 10678.28 samples/sec Loss 4.1034 LearningRate 0.0009 Epoch: 36 Global Step: 205350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:29,354-Speed 10463.11 samples/sec Loss 3.9871 LearningRate 0.0009 Epoch: 36 Global Step: 205360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:30,280-Speed 11071.73 samples/sec Loss 4.1138 LearningRate 0.0009 Epoch: 36 Global Step: 205370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:31,265-Speed 10403.21 samples/sec Loss 4.0729 LearningRate 0.0009 Epoch: 36 Global Step: 205380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:32,204-Speed 10913.68 samples/sec Loss 4.0662 LearningRate 0.0009 Epoch: 36 Global Step: 205390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:33,121-Speed 11170.25 samples/sec Loss 3.9563 LearningRate 0.0009 Epoch: 36 Global Step: 205400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:34,027-Speed 11311.20 samples/sec Loss 4.0693 LearningRate 0.0009 Epoch: 36 Global Step: 205410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:34,981-Speed 10744.71 samples/sec Loss 4.0188 LearningRate 0.0009 Epoch: 36 Global Step: 205420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:35,958-Speed 10478.60 samples/sec Loss 4.1153 LearningRate 0.0009 Epoch: 36 Global Step: 205430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:36,915-Speed 10709.67 samples/sec Loss 3.9502 LearningRate 0.0009 Epoch: 36 Global Step: 205440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:37,838-Speed 11108.19 samples/sec Loss 4.1334 LearningRate 0.0009 Epoch: 36 Global Step: 205450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:38,818-Speed 10454.48 samples/sec Loss 3.9008 LearningRate 0.0009 Epoch: 36 Global Step: 205460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:39,814-Speed 10280.56 samples/sec Loss 4.0513 LearningRate 0.0009 Epoch: 36 Global Step: 205470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:40,790-Speed 10503.83 samples/sec Loss 4.0215 LearningRate 0.0009 Epoch: 36 Global Step: 205480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:41,768-Speed 10477.59 samples/sec Loss 4.0562 LearningRate 0.0009 Epoch: 36 Global Step: 205490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:42,697-Speed 11027.62 samples/sec Loss 4.0441 LearningRate 0.0009 Epoch: 36 Global Step: 205500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:32:43,624-Speed 11058.09 samples/sec Loss 3.9628 LearningRate 0.0009 Epoch: 36 Global Step: 205510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:44,579-Speed 10727.03 samples/sec Loss 4.0570 LearningRate 0.0009 Epoch: 36 Global Step: 205520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:45,497-Speed 11160.79 samples/sec Loss 4.1268 LearningRate 0.0009 Epoch: 36 Global Step: 205530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:46,414-Speed 11176.18 samples/sec Loss 4.1219 LearningRate 0.0009 Epoch: 36 Global Step: 205540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:47,356-Speed 10883.24 samples/sec Loss 4.0843 LearningRate 0.0009 Epoch: 36 Global Step: 205550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:48,315-Speed 10683.52 samples/sec Loss 4.0198 LearningRate 0.0009 Epoch: 36 Global Step: 205560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:49,313-Speed 10263.84 samples/sec Loss 4.1334 LearningRate 0.0009 Epoch: 36 Global Step: 205570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:50,232-Speed 11156.71 samples/sec Loss 4.0905 LearningRate 0.0009 Epoch: 36 Global Step: 205580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:51,184-Speed 10762.80 samples/sec Loss 4.0338 LearningRate 0.0009 Epoch: 36 Global Step: 205590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:52,136-Speed 10761.03 samples/sec Loss 3.9721 LearningRate 0.0009 Epoch: 36 Global Step: 205600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:53,105-Speed 10581.37 samples/sec Loss 4.0579 LearningRate 0.0009 Epoch: 36 Global Step: 205610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:54,045-Speed 10901.36 samples/sec Loss 4.0132 LearningRate 0.0009 Epoch: 36 Global Step: 205620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:54,991-Speed 10822.77 samples/sec Loss 4.1142 LearningRate 0.0009 Epoch: 36 Global Step: 205630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:32:55,955-Speed 10637.88 samples/sec Loss 4.1142 LearningRate 0.0009 Epoch: 36 Global Step: 205640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:56,888-Speed 10980.97 samples/sec Loss 4.0619 LearningRate 0.0009 Epoch: 36 Global Step: 205650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:57,814-Speed 11062.85 samples/sec Loss 4.1084 LearningRate 0.0009 Epoch: 36 Global Step: 205660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:58,745-Speed 11003.32 samples/sec Loss 4.1274 LearningRate 0.0009 Epoch: 36 Global Step: 205670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:32:59,701-Speed 10724.33 samples/sec Loss 4.0398 LearningRate 0.0009 Epoch: 36 Global Step: 205680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:00,646-Speed 10839.19 samples/sec Loss 4.0681 LearningRate 0.0009 Epoch: 36 Global Step: 205690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:01,618-Speed 10543.02 samples/sec Loss 3.8598 LearningRate 0.0009 Epoch: 36 Global Step: 205700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:02,611-Speed 10319.43 samples/sec Loss 4.1946 LearningRate 0.0009 Epoch: 36 Global Step: 205710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:03,554-Speed 10871.76 samples/sec Loss 4.1256 LearningRate 0.0009 Epoch: 36 Global Step: 205720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:04,539-Speed 10397.11 samples/sec Loss 4.1541 LearningRate 0.0009 Epoch: 36 Global Step: 205730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:05,469-Speed 11016.38 samples/sec Loss 4.1755 LearningRate 0.0009 Epoch: 36 Global Step: 205740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:06,387-Speed 11160.19 samples/sec Loss 3.9902 LearningRate 0.0009 Epoch: 36 Global Step: 205750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:07,329-Speed 10876.55 samples/sec Loss 3.9929 LearningRate 0.0009 Epoch: 36 Global Step: 205760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:08,269-Speed 10909.36 samples/sec Loss 4.0912 LearningRate 0.0009 Epoch: 36 Global Step: 205770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:09,182-Speed 11221.82 samples/sec Loss 4.1334 LearningRate 0.0009 Epoch: 36 Global Step: 205780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:10,122-Speed 10903.36 samples/sec Loss 4.1447 LearningRate 0.0009 Epoch: 36 Global Step: 205790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:11,074-Speed 10763.90 samples/sec Loss 4.0399 LearningRate 0.0009 Epoch: 36 Global Step: 205800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:12,014-Speed 10900.97 samples/sec Loss 4.0876 LearningRate 0.0009 Epoch: 36 Global Step: 205810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:12,963-Speed 10798.25 samples/sec Loss 4.0679 LearningRate 0.0009 Epoch: 36 Global Step: 205820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:13,937-Speed 10516.05 samples/sec Loss 4.0571 LearningRate 0.0009 Epoch: 36 Global Step: 205830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:14,869-Speed 10998.96 samples/sec Loss 3.9183 LearningRate 0.0009 Epoch: 36 Global Step: 205840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:15,827-Speed 10695.88 samples/sec Loss 4.0468 LearningRate 0.0009 Epoch: 36 Global Step: 205850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:16,769-Speed 10871.23 samples/sec Loss 4.1131 LearningRate 0.0009 Epoch: 36 Global Step: 205860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:17,725-Speed 10723.31 samples/sec Loss 4.0559 LearningRate 0.0009 Epoch: 36 Global Step: 205870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:18,642-Speed 11173.71 samples/sec Loss 3.9994 LearningRate 0.0009 Epoch: 36 Global Step: 205880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:19,626-Speed 10419.87 samples/sec Loss 4.0824 LearningRate 0.0009 Epoch: 36 Global Step: 205890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:20,555-Speed 11027.63 samples/sec Loss 4.0846 LearningRate 0.0009 Epoch: 36 Global Step: 205900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:21,467-Speed 11233.62 samples/sec Loss 4.0358 LearningRate 0.0009 Epoch: 36 Global Step: 205910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:33:22,377-Speed 11254.90 samples/sec Loss 4.1303 LearningRate 0.0009 Epoch: 36 Global Step: 205920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:23,328-Speed 10781.68 samples/sec Loss 4.1020 LearningRate 0.0009 Epoch: 36 Global Step: 205930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:24,263-Speed 10957.91 samples/sec Loss 4.1408 LearningRate 0.0009 Epoch: 36 Global Step: 205940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:25,260-Speed 10276.90 samples/sec Loss 4.0785 LearningRate 0.0009 Epoch: 36 Global Step: 205950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:26,249-Speed 10360.13 samples/sec Loss 4.1190 LearningRate 0.0009 Epoch: 36 Global Step: 205960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:27,252-Speed 10212.97 samples/sec Loss 3.9921 LearningRate 0.0009 Epoch: 36 Global Step: 205970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:28,215-Speed 10647.11 samples/sec Loss 4.1298 LearningRate 0.0009 Epoch: 36 Global Step: 205980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:29,191-Speed 10497.49 samples/sec Loss 4.0176 LearningRate 0.0009 Epoch: 36 Global Step: 205990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:30,197-Speed 10181.49 samples/sec Loss 4.1527 LearningRate 0.0009 Epoch: 36 Global Step: 206000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:33:51,365-[lfw][206000]XNorm: 7.660767 Training: 2022-04-11 06:33:51,366-[lfw][206000]Accuracy-Flip: 0.99650+-0.00283 Training: 2022-04-11 06:33:51,366-[lfw][206000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:34:16,045-[cfp_fp][206000]XNorm: 6.551075 Training: 2022-04-11 06:34:16,045-[cfp_fp][206000]Accuracy-Flip: 0.95643+-0.01116 Training: 2022-04-11 06:34:16,046-[cfp_fp][206000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:34:37,228-[agedb_30][206000]XNorm: 7.509536 Training: 2022-04-11 06:34:37,229-[agedb_30][206000]Accuracy-Flip: 0.96917+-0.00712 Training: 2022-04-11 06:34:37,229-[agedb_30][206000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:34:38,194-Speed 150.60 samples/sec Loss 4.0151 LearningRate 0.0009 Epoch: 36 Global Step: 206010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:39,161-Speed 10595.96 samples/sec Loss 4.2106 LearningRate 0.0009 Epoch: 36 Global Step: 206020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:34:40,102-Speed 10890.56 samples/sec Loss 4.1577 LearningRate 0.0009 Epoch: 36 Global Step: 206030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:41,094-Speed 10324.24 samples/sec Loss 4.0380 LearningRate 0.0009 Epoch: 36 Global Step: 206040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:42,054-Speed 10678.67 samples/sec Loss 4.0779 LearningRate 0.0009 Epoch: 36 Global Step: 206050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:42,978-Speed 11088.69 samples/sec Loss 4.0906 LearningRate 0.0009 Epoch: 36 Global Step: 206060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:44,000-Speed 10027.05 samples/sec Loss 4.1619 LearningRate 0.0009 Epoch: 36 Global Step: 206070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:44,916-Speed 11183.56 samples/sec Loss 4.2075 LearningRate 0.0009 Epoch: 36 Global Step: 206080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:45,823-Speed 11299.42 samples/sec Loss 4.1146 LearningRate 0.0009 Epoch: 36 Global Step: 206090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:46,726-Speed 11347.72 samples/sec Loss 4.0504 LearningRate 0.0009 Epoch: 36 Global Step: 206100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:47,639-Speed 11223.42 samples/sec Loss 4.1222 LearningRate 0.0009 Epoch: 36 Global Step: 206110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:48,572-Speed 10981.92 samples/sec Loss 4.1622 LearningRate 0.0009 Epoch: 36 Global Step: 206120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:49,504-Speed 10997.48 samples/sec Loss 4.1965 LearningRate 0.0009 Epoch: 36 Global Step: 206130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:50,435-Speed 10997.67 samples/sec Loss 3.9891 LearningRate 0.0009 Epoch: 36 Global Step: 206140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:51,392-Speed 10716.12 samples/sec Loss 4.1173 LearningRate 0.0009 Epoch: 36 Global Step: 206150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:52,349-Speed 10705.48 samples/sec Loss 4.1764 LearningRate 0.0009 Epoch: 36 Global Step: 206160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:53,301-Speed 10757.65 samples/sec Loss 4.1543 LearningRate 0.0009 Epoch: 36 Global Step: 206170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:54,270-Speed 10576.54 samples/sec Loss 4.1395 LearningRate 0.0009 Epoch: 36 Global Step: 206180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:55,231-Speed 10667.98 samples/sec Loss 4.1404 LearningRate 0.0009 Epoch: 36 Global Step: 206190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:56,184-Speed 10748.65 samples/sec Loss 4.0773 LearningRate 0.0009 Epoch: 36 Global Step: 206200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:57,144-Speed 10669.45 samples/sec Loss 4.0871 LearningRate 0.0009 Epoch: 36 Global Step: 206210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:58,077-Speed 10982.90 samples/sec Loss 4.1046 LearningRate 0.0009 Epoch: 36 Global Step: 206220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:34:59,001-Speed 11097.12 samples/sec Loss 4.0880 LearningRate 0.0009 Epoch: 36 Global Step: 206230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:00,000-Speed 10257.06 samples/sec Loss 4.2237 LearningRate 0.0009 Epoch: 36 Global Step: 206240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:00,963-Speed 10637.62 samples/sec Loss 4.0581 LearningRate 0.0009 Epoch: 36 Global Step: 206250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:01,945-Speed 10434.55 samples/sec Loss 4.1137 LearningRate 0.0009 Epoch: 36 Global Step: 206260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:02,896-Speed 10778.32 samples/sec Loss 3.9152 LearningRate 0.0009 Epoch: 36 Global Step: 206270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:03,835-Speed 10912.85 samples/sec Loss 4.2047 LearningRate 0.0009 Epoch: 36 Global Step: 206280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:04,738-Speed 11346.85 samples/sec Loss 3.9901 LearningRate 0.0009 Epoch: 36 Global Step: 206290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:05,683-Speed 10846.78 samples/sec Loss 4.0802 LearningRate 0.0009 Epoch: 36 Global Step: 206300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:06,634-Speed 10778.56 samples/sec Loss 4.0158 LearningRate 0.0009 Epoch: 36 Global Step: 206310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:07,615-Speed 10444.26 samples/sec Loss 4.0049 LearningRate 0.0009 Epoch: 36 Global Step: 206320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:08,553-Speed 10928.76 samples/sec Loss 4.1320 LearningRate 0.0009 Epoch: 36 Global Step: 206330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:09,467-Speed 11204.86 samples/sec Loss 4.1951 LearningRate 0.0009 Epoch: 36 Global Step: 206340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:10,420-Speed 10748.37 samples/sec Loss 3.9999 LearningRate 0.0009 Epoch: 36 Global Step: 206350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:11,372-Speed 10770.01 samples/sec Loss 4.1230 LearningRate 0.0009 Epoch: 36 Global Step: 206360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:12,300-Speed 11044.22 samples/sec Loss 3.9994 LearningRate 0.0009 Epoch: 36 Global Step: 206370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:13,243-Speed 10862.18 samples/sec Loss 4.1041 LearningRate 0.0009 Epoch: 36 Global Step: 206380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:14,214-Speed 10547.47 samples/sec Loss 4.0463 LearningRate 0.0009 Epoch: 36 Global Step: 206390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:15,145-Speed 11008.47 samples/sec Loss 4.0447 LearningRate 0.0009 Epoch: 36 Global Step: 206400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:16,104-Speed 10684.30 samples/sec Loss 4.0830 LearningRate 0.0009 Epoch: 36 Global Step: 206410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:17,051-Speed 10825.52 samples/sec Loss 4.0479 LearningRate 0.0009 Epoch: 36 Global Step: 206420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:17,980-Speed 11031.90 samples/sec Loss 4.0487 LearningRate 0.0009 Epoch: 36 Global Step: 206430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:18,910-Speed 11017.78 samples/sec Loss 4.0650 LearningRate 0.0009 Epoch: 36 Global Step: 206440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:19,815-Speed 11322.42 samples/sec Loss 4.0197 LearningRate 0.0009 Epoch: 36 Global Step: 206450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:20,752-Speed 10933.65 samples/sec Loss 4.1933 LearningRate 0.0009 Epoch: 36 Global Step: 206460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:21,650-Speed 11421.03 samples/sec Loss 4.2149 LearningRate 0.0009 Epoch: 36 Global Step: 206470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:22,573-Speed 11100.00 samples/sec Loss 4.1108 LearningRate 0.0008 Epoch: 36 Global Step: 206480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:23,517-Speed 10845.62 samples/sec Loss 4.0001 LearningRate 0.0008 Epoch: 36 Global Step: 206490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:24,453-Speed 10951.28 samples/sec Loss 4.0806 LearningRate 0.0008 Epoch: 36 Global Step: 206500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:25,428-Speed 10505.58 samples/sec Loss 4.2001 LearningRate 0.0008 Epoch: 36 Global Step: 206510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:26,364-Speed 10959.57 samples/sec Loss 4.1232 LearningRate 0.0008 Epoch: 36 Global Step: 206520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:27,263-Speed 11392.10 samples/sec Loss 4.0691 LearningRate 0.0008 Epoch: 36 Global Step: 206530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:28,211-Speed 10807.53 samples/sec Loss 4.0938 LearningRate 0.0008 Epoch: 36 Global Step: 206540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:29,159-Speed 10819.74 samples/sec Loss 4.1500 LearningRate 0.0008 Epoch: 36 Global Step: 206550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:30,177-Speed 10063.56 samples/sec Loss 4.0057 LearningRate 0.0008 Epoch: 36 Global Step: 206560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:31,150-Speed 10527.53 samples/sec Loss 4.0070 LearningRate 0.0008 Epoch: 36 Global Step: 206570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:32,110-Speed 10674.95 samples/sec Loss 4.0235 LearningRate 0.0008 Epoch: 36 Global Step: 206580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:33,014-Speed 11332.75 samples/sec Loss 4.0004 LearningRate 0.0008 Epoch: 36 Global Step: 206590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:33,938-Speed 11089.66 samples/sec Loss 4.1861 LearningRate 0.0008 Epoch: 36 Global Step: 206600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:34,879-Speed 10888.45 samples/sec Loss 4.1980 LearningRate 0.0008 Epoch: 36 Global Step: 206610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:35,814-Speed 10963.61 samples/sec Loss 4.1387 LearningRate 0.0008 Epoch: 36 Global Step: 206620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:36,747-Speed 10981.81 samples/sec Loss 3.9847 LearningRate 0.0008 Epoch: 36 Global Step: 206630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:37,718-Speed 10555.23 samples/sec Loss 4.1098 LearningRate 0.0008 Epoch: 36 Global Step: 206640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:38,653-Speed 10958.16 samples/sec Loss 4.0286 LearningRate 0.0008 Epoch: 36 Global Step: 206650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:39,588-Speed 10956.72 samples/sec Loss 4.0992 LearningRate 0.0008 Epoch: 36 Global Step: 206660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:40,523-Speed 10966.36 samples/sec Loss 4.0248 LearningRate 0.0008 Epoch: 36 Global Step: 206670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:41,485-Speed 10652.75 samples/sec Loss 4.1052 LearningRate 0.0008 Epoch: 36 Global Step: 206680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:35:42,440-Speed 10722.80 samples/sec Loss 4.0298 LearningRate 0.0008 Epoch: 36 Global Step: 206690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:43,372-Speed 11001.73 samples/sec Loss 4.1013 LearningRate 0.0008 Epoch: 36 Global Step: 206700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:44,336-Speed 10628.73 samples/sec Loss 4.0364 LearningRate 0.0008 Epoch: 36 Global Step: 206710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:45,247-Speed 11243.29 samples/sec Loss 4.2312 LearningRate 0.0008 Epoch: 36 Global Step: 206720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:46,187-Speed 10904.98 samples/sec Loss 4.0739 LearningRate 0.0008 Epoch: 36 Global Step: 206730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:47,108-Speed 11129.86 samples/sec Loss 4.1532 LearningRate 0.0008 Epoch: 36 Global Step: 206740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:48,040-Speed 10989.33 samples/sec Loss 4.0493 LearningRate 0.0008 Epoch: 36 Global Step: 206750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:49,025-Speed 10401.71 samples/sec Loss 4.0927 LearningRate 0.0008 Epoch: 36 Global Step: 206760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:50,004-Speed 10466.47 samples/sec Loss 4.1598 LearningRate 0.0008 Epoch: 36 Global Step: 206770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:51,000-Speed 10288.82 samples/sec Loss 4.0305 LearningRate 0.0008 Epoch: 36 Global Step: 206780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:51,970-Speed 10575.63 samples/sec Loss 4.0943 LearningRate 0.0008 Epoch: 36 Global Step: 206790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:52,915-Speed 10834.24 samples/sec Loss 4.0005 LearningRate 0.0008 Epoch: 36 Global Step: 206800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:53,838-Speed 11106.99 samples/sec Loss 4.0857 LearningRate 0.0008 Epoch: 36 Global Step: 206810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:54,751-Speed 11225.86 samples/sec Loss 4.1843 LearningRate 0.0008 Epoch: 36 Global Step: 206820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:55,681-Speed 11013.21 samples/sec Loss 4.0876 LearningRate 0.0008 Epoch: 36 Global Step: 206830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:56,623-Speed 10883.96 samples/sec Loss 4.0288 LearningRate 0.0008 Epoch: 36 Global Step: 206840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:57,515-Speed 11486.96 samples/sec Loss 4.0728 LearningRate 0.0008 Epoch: 36 Global Step: 206850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:35:58,482-Speed 10595.76 samples/sec Loss 4.2117 LearningRate 0.0008 Epoch: 36 Global Step: 206860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:35:59,419-Speed 10933.64 samples/sec Loss 4.1809 LearningRate 0.0008 Epoch: 36 Global Step: 206870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:00,437-Speed 10066.77 samples/sec Loss 3.9337 LearningRate 0.0008 Epoch: 36 Global Step: 206880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:01,389-Speed 10766.01 samples/sec Loss 3.9808 LearningRate 0.0008 Epoch: 36 Global Step: 206890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:02,335-Speed 10823.57 samples/sec Loss 4.0328 LearningRate 0.0008 Epoch: 36 Global Step: 206900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:03,317-Speed 10440.31 samples/sec Loss 4.1221 LearningRate 0.0008 Epoch: 36 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:04,295-Speed 10481.08 samples/sec Loss 4.0480 LearningRate 0.0008 Epoch: 36 Global Step: 206920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:05,263-Speed 10587.13 samples/sec Loss 4.1983 LearningRate 0.0008 Epoch: 36 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:06,212-Speed 10793.18 samples/sec Loss 4.0549 LearningRate 0.0008 Epoch: 36 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:07,169-Speed 10715.56 samples/sec Loss 4.1286 LearningRate 0.0008 Epoch: 36 Global Step: 206950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:08,130-Speed 10659.18 samples/sec Loss 4.0931 LearningRate 0.0008 Epoch: 36 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:09,056-Speed 11069.45 samples/sec Loss 4.1635 LearningRate 0.0008 Epoch: 36 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:09,962-Speed 11298.28 samples/sec Loss 4.0456 LearningRate 0.0008 Epoch: 36 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:10,871-Speed 11281.11 samples/sec Loss 4.1110 LearningRate 0.0008 Epoch: 36 Global Step: 206990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:11,812-Speed 10883.62 samples/sec Loss 4.0790 LearningRate 0.0008 Epoch: 36 Global Step: 207000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:12,744-Speed 11005.28 samples/sec Loss 4.1569 LearningRate 0.0008 Epoch: 36 Global Step: 207010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:13,709-Speed 10617.26 samples/sec Loss 4.1056 LearningRate 0.0008 Epoch: 36 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:14,643-Speed 10970.03 samples/sec Loss 4.0601 LearningRate 0.0008 Epoch: 36 Global Step: 207030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:15,613-Speed 10564.06 samples/sec Loss 4.0631 LearningRate 0.0008 Epoch: 36 Global Step: 207040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:16,555-Speed 10885.08 samples/sec Loss 4.1080 LearningRate 0.0008 Epoch: 36 Global Step: 207050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:17,509-Speed 10738.93 samples/sec Loss 4.1015 LearningRate 0.0008 Epoch: 36 Global Step: 207060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:18,500-Speed 10335.40 samples/sec Loss 4.1044 LearningRate 0.0008 Epoch: 36 Global Step: 207070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:19,429-Speed 11040.31 samples/sec Loss 3.9940 LearningRate 0.0008 Epoch: 36 Global Step: 207080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:20,374-Speed 10836.34 samples/sec Loss 4.1069 LearningRate 0.0008 Epoch: 36 Global Step: 207090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:21,287-Speed 11227.52 samples/sec Loss 4.0653 LearningRate 0.0008 Epoch: 36 Global Step: 207100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:22,193-Speed 11311.50 samples/sec Loss 4.0929 LearningRate 0.0008 Epoch: 36 Global Step: 207110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:23,116-Speed 11095.22 samples/sec Loss 4.0817 LearningRate 0.0008 Epoch: 36 Global Step: 207120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:24,049-Speed 10984.68 samples/sec Loss 4.0947 LearningRate 0.0008 Epoch: 36 Global Step: 207130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:24,977-Speed 11044.32 samples/sec Loss 4.1336 LearningRate 0.0008 Epoch: 36 Global Step: 207140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:25,896-Speed 11155.71 samples/sec Loss 4.0384 LearningRate 0.0008 Epoch: 36 Global Step: 207150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:26,840-Speed 10847.78 samples/sec Loss 4.0918 LearningRate 0.0008 Epoch: 36 Global Step: 207160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:27,746-Speed 11318.79 samples/sec Loss 4.0997 LearningRate 0.0008 Epoch: 36 Global Step: 207170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:28,665-Speed 11146.24 samples/sec Loss 4.1161 LearningRate 0.0008 Epoch: 36 Global Step: 207180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:29,583-Speed 11158.86 samples/sec Loss 3.9799 LearningRate 0.0008 Epoch: 36 Global Step: 207190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:30,527-Speed 10850.73 samples/sec Loss 4.0898 LearningRate 0.0008 Epoch: 36 Global Step: 207200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:31,473-Speed 10842.19 samples/sec Loss 4.1827 LearningRate 0.0008 Epoch: 36 Global Step: 207210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:32,428-Speed 10722.38 samples/sec Loss 3.9894 LearningRate 0.0008 Epoch: 36 Global Step: 207220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:33,329-Speed 11378.37 samples/sec Loss 4.1102 LearningRate 0.0008 Epoch: 36 Global Step: 207230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:34,234-Speed 11317.62 samples/sec Loss 4.0239 LearningRate 0.0008 Epoch: 36 Global Step: 207240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:35,164-Speed 11025.81 samples/sec Loss 4.1322 LearningRate 0.0008 Epoch: 36 Global Step: 207250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:36,071-Speed 11295.98 samples/sec Loss 3.9756 LearningRate 0.0008 Epoch: 36 Global Step: 207260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:36,975-Speed 11338.98 samples/sec Loss 4.1068 LearningRate 0.0008 Epoch: 36 Global Step: 207270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:37,915-Speed 10901.44 samples/sec Loss 4.0382 LearningRate 0.0008 Epoch: 36 Global Step: 207280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:38,887-Speed 10548.72 samples/sec Loss 4.0842 LearningRate 0.0008 Epoch: 36 Global Step: 207290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:39,831-Speed 10848.42 samples/sec Loss 4.1650 LearningRate 0.0008 Epoch: 36 Global Step: 207300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:40,774-Speed 10874.66 samples/sec Loss 4.0531 LearningRate 0.0008 Epoch: 36 Global Step: 207310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:41,752-Speed 10474.65 samples/sec Loss 4.1231 LearningRate 0.0008 Epoch: 36 Global Step: 207320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:42,702-Speed 10783.72 samples/sec Loss 4.0580 LearningRate 0.0008 Epoch: 36 Global Step: 207330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:43,604-Speed 11357.08 samples/sec Loss 4.0922 LearningRate 0.0008 Epoch: 36 Global Step: 207340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:44,558-Speed 10743.75 samples/sec Loss 4.0717 LearningRate 0.0008 Epoch: 36 Global Step: 207350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:45,462-Speed 11332.22 samples/sec Loss 4.1775 LearningRate 0.0008 Epoch: 36 Global Step: 207360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:46,366-Speed 11346.49 samples/sec Loss 3.9962 LearningRate 0.0008 Epoch: 36 Global Step: 207370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:47,270-Speed 11332.66 samples/sec Loss 3.9743 LearningRate 0.0008 Epoch: 36 Global Step: 207380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:48,226-Speed 10722.20 samples/sec Loss 3.9510 LearningRate 0.0008 Epoch: 36 Global Step: 207390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:49,197-Speed 10551.59 samples/sec Loss 4.1123 LearningRate 0.0008 Epoch: 36 Global Step: 207400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:50,155-Speed 10692.46 samples/sec Loss 4.1411 LearningRate 0.0008 Epoch: 36 Global Step: 207410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:51,158-Speed 10219.13 samples/sec Loss 4.0836 LearningRate 0.0008 Epoch: 36 Global Step: 207420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:52,100-Speed 10880.39 samples/sec Loss 4.2556 LearningRate 0.0008 Epoch: 36 Global Step: 207430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:53,010-Speed 11258.40 samples/sec Loss 4.0650 LearningRate 0.0008 Epoch: 36 Global Step: 207440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:53,977-Speed 10593.14 samples/sec Loss 4.1584 LearningRate 0.0008 Epoch: 36 Global Step: 207450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:54,934-Speed 10709.02 samples/sec Loss 4.1395 LearningRate 0.0008 Epoch: 36 Global Step: 207460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:36:55,878-Speed 10846.41 samples/sec Loss 4.1797 LearningRate 0.0008 Epoch: 36 Global Step: 207470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:56,848-Speed 10572.73 samples/sec Loss 4.0942 LearningRate 0.0008 Epoch: 36 Global Step: 207480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:57,793-Speed 10841.93 samples/sec Loss 4.0764 LearningRate 0.0008 Epoch: 36 Global Step: 207490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:58,768-Speed 10512.84 samples/sec Loss 4.1568 LearningRate 0.0008 Epoch: 36 Global Step: 207500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:36:59,708-Speed 10901.17 samples/sec Loss 4.1765 LearningRate 0.0008 Epoch: 36 Global Step: 207510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:00,659-Speed 10776.99 samples/sec Loss 4.1063 LearningRate 0.0008 Epoch: 36 Global Step: 207520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:01,638-Speed 10466.64 samples/sec Loss 4.1604 LearningRate 0.0008 Epoch: 36 Global Step: 207530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:02,601-Speed 10643.20 samples/sec Loss 4.0807 LearningRate 0.0008 Epoch: 36 Global Step: 207540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:03,544-Speed 10863.28 samples/sec Loss 4.0045 LearningRate 0.0008 Epoch: 36 Global Step: 207550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:04,481-Speed 10937.91 samples/sec Loss 4.0130 LearningRate 0.0008 Epoch: 36 Global Step: 207560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:05,402-Speed 11119.13 samples/sec Loss 4.0333 LearningRate 0.0008 Epoch: 36 Global Step: 207570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:06,318-Speed 11195.41 samples/sec Loss 4.1864 LearningRate 0.0008 Epoch: 36 Global Step: 207580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:07,297-Speed 10468.26 samples/sec Loss 3.9505 LearningRate 0.0008 Epoch: 36 Global Step: 207590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:08,231-Speed 10963.63 samples/sec Loss 4.1335 LearningRate 0.0008 Epoch: 36 Global Step: 207600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:09,147-Speed 11194.75 samples/sec Loss 4.1053 LearningRate 0.0008 Epoch: 36 Global Step: 207610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:10,056-Speed 11271.90 samples/sec Loss 4.2050 LearningRate 0.0008 Epoch: 36 Global Step: 207620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:10,988-Speed 10986.66 samples/sec Loss 4.0749 LearningRate 0.0008 Epoch: 36 Global Step: 207630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:11,943-Speed 10735.93 samples/sec Loss 4.1038 LearningRate 0.0008 Epoch: 36 Global Step: 207640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:12,915-Speed 10538.49 samples/sec Loss 4.2196 LearningRate 0.0008 Epoch: 36 Global Step: 207650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:13,842-Speed 11059.61 samples/sec Loss 4.2395 LearningRate 0.0008 Epoch: 36 Global Step: 207660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:14,756-Speed 11203.53 samples/sec Loss 3.9637 LearningRate 0.0008 Epoch: 36 Global Step: 207670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:15,705-Speed 10797.22 samples/sec Loss 4.1372 LearningRate 0.0008 Epoch: 36 Global Step: 207680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:16,638-Speed 10989.88 samples/sec Loss 3.9634 LearningRate 0.0008 Epoch: 36 Global Step: 207690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:17,628-Speed 10351.54 samples/sec Loss 4.0596 LearningRate 0.0008 Epoch: 36 Global Step: 207700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:18,533-Speed 11331.88 samples/sec Loss 4.0343 LearningRate 0.0008 Epoch: 36 Global Step: 207710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:19,482-Speed 10796.39 samples/sec Loss 4.2208 LearningRate 0.0008 Epoch: 36 Global Step: 207720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:20,387-Speed 11319.19 samples/sec Loss 4.0860 LearningRate 0.0008 Epoch: 36 Global Step: 207730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:21,287-Speed 11384.47 samples/sec Loss 4.0800 LearningRate 0.0008 Epoch: 36 Global Step: 207740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:22,188-Speed 11378.95 samples/sec Loss 4.1388 LearningRate 0.0007 Epoch: 36 Global Step: 207750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:23,084-Speed 11437.41 samples/sec Loss 4.2417 LearningRate 0.0007 Epoch: 36 Global Step: 207760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:24,014-Speed 11018.11 samples/sec Loss 4.2268 LearningRate 0.0007 Epoch: 36 Global Step: 207770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:24,948-Speed 10963.64 samples/sec Loss 4.0490 LearningRate 0.0007 Epoch: 36 Global Step: 207780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:25,910-Speed 10650.43 samples/sec Loss 4.1010 LearningRate 0.0007 Epoch: 36 Global Step: 207790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:26,935-Speed 9999.42 samples/sec Loss 4.0211 LearningRate 0.0007 Epoch: 36 Global Step: 207800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:27,899-Speed 10627.14 samples/sec Loss 4.1590 LearningRate 0.0007 Epoch: 36 Global Step: 207810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:28,854-Speed 10738.30 samples/sec Loss 4.1184 LearningRate 0.0007 Epoch: 36 Global Step: 207820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:29,788-Speed 10962.11 samples/sec Loss 4.0616 LearningRate 0.0007 Epoch: 36 Global Step: 207830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:30,749-Speed 10665.80 samples/sec Loss 3.9804 LearningRate 0.0007 Epoch: 36 Global Step: 207840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:31,668-Speed 11150.68 samples/sec Loss 4.1387 LearningRate 0.0007 Epoch: 36 Global Step: 207850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:37:32,575-Speed 11296.94 samples/sec Loss 4.2182 LearningRate 0.0007 Epoch: 36 Global Step: 207860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:33,498-Speed 11097.40 samples/sec Loss 4.1210 LearningRate 0.0007 Epoch: 36 Global Step: 207870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:34,425-Speed 11059.78 samples/sec Loss 4.1097 LearningRate 0.0007 Epoch: 36 Global Step: 207880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:35,347-Speed 11113.88 samples/sec Loss 4.0738 LearningRate 0.0007 Epoch: 36 Global Step: 207890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:36,293-Speed 10832.46 samples/sec Loss 4.2800 LearningRate 0.0007 Epoch: 36 Global Step: 207900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:37,243-Speed 10791.48 samples/sec Loss 4.1683 LearningRate 0.0007 Epoch: 36 Global Step: 207910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:38,235-Speed 10326.21 samples/sec Loss 4.1138 LearningRate 0.0007 Epoch: 36 Global Step: 207920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:39,187-Speed 10763.90 samples/sec Loss 4.1291 LearningRate 0.0007 Epoch: 36 Global Step: 207930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:40,116-Speed 11032.96 samples/sec Loss 4.0851 LearningRate 0.0007 Epoch: 36 Global Step: 207940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:41,068-Speed 10760.66 samples/sec Loss 4.1226 LearningRate 0.0007 Epoch: 36 Global Step: 207950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:42,014-Speed 10834.31 samples/sec Loss 4.0180 LearningRate 0.0007 Epoch: 36 Global Step: 207960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:43,039-Speed 9991.25 samples/sec Loss 4.1692 LearningRate 0.0007 Epoch: 36 Global Step: 207970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:37:43,999-Speed 10677.77 samples/sec Loss 4.1388 LearningRate 0.0007 Epoch: 36 Global Step: 207980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:44,928-Speed 11030.73 samples/sec Loss 4.2782 LearningRate 0.0007 Epoch: 36 Global Step: 207990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:37:45,825-Speed 11417.82 samples/sec Loss 4.1056 LearningRate 0.0007 Epoch: 36 Global Step: 208000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:07,082-[lfw][208000]XNorm: 7.684647 Training: 2022-04-11 06:38:07,083-[lfw][208000]Accuracy-Flip: 0.99667+-0.00307 Training: 2022-04-11 06:38:07,083-[lfw][208000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:38:31,583-[cfp_fp][208000]XNorm: 6.570247 Training: 2022-04-11 06:38:31,583-[cfp_fp][208000]Accuracy-Flip: 0.95571+-0.00981 Training: 2022-04-11 06:38:31,584-[cfp_fp][208000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:38:52,657-[agedb_30][208000]XNorm: 7.537229 Training: 2022-04-11 06:38:52,658-[agedb_30][208000]Accuracy-Flip: 0.96850+-0.00758 Training: 2022-04-11 06:38:52,658-[agedb_30][208000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:38:53,663-Speed 150.95 samples/sec Loss 4.1506 LearningRate 0.0007 Epoch: 36 Global Step: 208010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:54,613-Speed 10780.24 samples/sec Loss 4.1072 LearningRate 0.0007 Epoch: 36 Global Step: 208020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:55,633-Speed 10045.52 samples/sec Loss 4.0612 LearningRate 0.0007 Epoch: 36 Global Step: 208030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:56,588-Speed 10732.31 samples/sec Loss 4.1838 LearningRate 0.0007 Epoch: 36 Global Step: 208040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:57,513-Speed 11078.11 samples/sec Loss 4.1483 LearningRate 0.0007 Epoch: 36 Global Step: 208050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:58,488-Speed 10510.38 samples/sec Loss 4.2534 LearningRate 0.0007 Epoch: 36 Global Step: 208060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:38:59,533-Speed 9803.55 samples/sec Loss 4.1476 LearningRate 0.0007 Epoch: 36 Global Step: 208070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:00,502-Speed 10566.72 samples/sec Loss 4.1418 LearningRate 0.0007 Epoch: 36 Global Step: 208080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:01,463-Speed 10666.59 samples/sec Loss 4.0405 LearningRate 0.0007 Epoch: 36 Global Step: 208090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:02,371-Speed 11290.53 samples/sec Loss 4.1342 LearningRate 0.0007 Epoch: 36 Global Step: 208100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:03,317-Speed 10821.81 samples/sec Loss 4.1353 LearningRate 0.0007 Epoch: 36 Global Step: 208110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:04,251-Speed 10976.59 samples/sec Loss 4.2059 LearningRate 0.0007 Epoch: 36 Global Step: 208120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:05,179-Speed 11036.51 samples/sec Loss 4.1943 LearningRate 0.0007 Epoch: 36 Global Step: 208130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:06,120-Speed 10890.62 samples/sec Loss 4.0118 LearningRate 0.0007 Epoch: 36 Global Step: 208140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:07,050-Speed 11025.01 samples/sec Loss 4.2826 LearningRate 0.0007 Epoch: 36 Global Step: 208150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:07,993-Speed 10868.86 samples/sec Loss 3.9598 LearningRate 0.0007 Epoch: 36 Global Step: 208160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:08,919-Speed 11054.61 samples/sec Loss 4.1244 LearningRate 0.0007 Epoch: 36 Global Step: 208170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:09,827-Speed 11288.69 samples/sec Loss 4.0631 LearningRate 0.0007 Epoch: 36 Global Step: 208180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:10,720-Speed 11481.28 samples/sec Loss 4.1624 LearningRate 0.0007 Epoch: 36 Global Step: 208190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:11,683-Speed 10642.10 samples/sec Loss 4.1714 LearningRate 0.0007 Epoch: 36 Global Step: 208200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:12,614-Speed 11004.41 samples/sec Loss 4.1458 LearningRate 0.0007 Epoch: 36 Global Step: 208210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:13,622-Speed 10172.88 samples/sec Loss 4.1625 LearningRate 0.0007 Epoch: 36 Global Step: 208220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:14,551-Speed 11024.83 samples/sec Loss 4.0672 LearningRate 0.0007 Epoch: 36 Global Step: 208230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:15,502-Speed 10771.06 samples/sec Loss 4.1189 LearningRate 0.0007 Epoch: 36 Global Step: 208240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:16,451-Speed 10799.58 samples/sec Loss 4.1805 LearningRate 0.0007 Epoch: 36 Global Step: 208250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:17,403-Speed 10767.51 samples/sec Loss 4.0550 LearningRate 0.0007 Epoch: 36 Global Step: 208260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:18,326-Speed 11102.72 samples/sec Loss 4.1415 LearningRate 0.0007 Epoch: 36 Global Step: 208270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:19,255-Speed 11026.23 samples/sec Loss 4.0270 LearningRate 0.0007 Epoch: 36 Global Step: 208280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:20,207-Speed 10764.79 samples/sec Loss 4.0653 LearningRate 0.0007 Epoch: 36 Global Step: 208290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:21,123-Speed 11183.29 samples/sec Loss 4.0637 LearningRate 0.0007 Epoch: 36 Global Step: 208300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:22,042-Speed 11157.67 samples/sec Loss 4.1541 LearningRate 0.0007 Epoch: 36 Global Step: 208310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:22,937-Speed 11444.45 samples/sec Loss 4.1340 LearningRate 0.0007 Epoch: 36 Global Step: 208320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:23,829-Speed 11489.54 samples/sec Loss 4.1782 LearningRate 0.0007 Epoch: 36 Global Step: 208330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:24,773-Speed 10854.66 samples/sec Loss 4.0313 LearningRate 0.0007 Epoch: 36 Global Step: 208340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:25,690-Speed 11177.97 samples/sec Loss 4.1330 LearningRate 0.0007 Epoch: 36 Global Step: 208350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:26,635-Speed 10834.24 samples/sec Loss 4.2743 LearningRate 0.0007 Epoch: 36 Global Step: 208360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:27,576-Speed 10892.77 samples/sec Loss 4.2108 LearningRate 0.0007 Epoch: 36 Global Step: 208370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:28,538-Speed 10657.31 samples/sec Loss 4.0755 LearningRate 0.0007 Epoch: 36 Global Step: 208380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:29,493-Speed 10728.31 samples/sec Loss 4.1966 LearningRate 0.0007 Epoch: 36 Global Step: 208390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:30,472-Speed 10468.97 samples/sec Loss 4.1574 LearningRate 0.0007 Epoch: 36 Global Step: 208400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:31,446-Speed 10517.85 samples/sec Loss 4.1512 LearningRate 0.0007 Epoch: 36 Global Step: 208410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:32,395-Speed 10794.21 samples/sec Loss 4.1804 LearningRate 0.0007 Epoch: 36 Global Step: 208420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:39:33,338-Speed 10866.31 samples/sec Loss 4.0722 LearningRate 0.0007 Epoch: 36 Global Step: 208430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:34,267-Speed 11031.20 samples/sec Loss 3.9888 LearningRate 0.0007 Epoch: 36 Global Step: 208440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:35,214-Speed 10819.08 samples/sec Loss 4.1488 LearningRate 0.0007 Epoch: 36 Global Step: 208450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:36,154-Speed 10911.51 samples/sec Loss 4.0040 LearningRate 0.0007 Epoch: 36 Global Step: 208460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:37,123-Speed 10578.17 samples/sec Loss 4.2417 LearningRate 0.0007 Epoch: 36 Global Step: 208470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:38,042-Speed 11150.99 samples/sec Loss 4.1047 LearningRate 0.0007 Epoch: 36 Global Step: 208480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:38,986-Speed 10846.18 samples/sec Loss 4.1944 LearningRate 0.0007 Epoch: 36 Global Step: 208490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:39,921-Speed 10960.11 samples/sec Loss 4.1145 LearningRate 0.0007 Epoch: 36 Global Step: 208500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:40,853-Speed 11000.39 samples/sec Loss 4.0891 LearningRate 0.0007 Epoch: 36 Global Step: 208510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:41,773-Speed 11132.55 samples/sec Loss 4.0927 LearningRate 0.0007 Epoch: 36 Global Step: 208520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:42,735-Speed 10654.09 samples/sec Loss 4.0635 LearningRate 0.0007 Epoch: 36 Global Step: 208530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:43,705-Speed 10563.94 samples/sec Loss 4.2328 LearningRate 0.0007 Epoch: 36 Global Step: 208540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:44,638-Speed 10982.95 samples/sec Loss 4.0755 LearningRate 0.0007 Epoch: 36 Global Step: 208550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:45,532-Speed 11460.47 samples/sec Loss 4.0115 LearningRate 0.0007 Epoch: 36 Global Step: 208560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:46,440-Speed 11282.66 samples/sec Loss 4.1745 LearningRate 0.0007 Epoch: 36 Global Step: 208570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:47,358-Speed 11165.18 samples/sec Loss 4.1359 LearningRate 0.0007 Epoch: 36 Global Step: 208580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:48,282-Speed 11091.24 samples/sec Loss 4.1773 LearningRate 0.0007 Epoch: 36 Global Step: 208590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:49,237-Speed 10725.25 samples/sec Loss 4.0532 LearningRate 0.0007 Epoch: 36 Global Step: 208600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:50,197-Speed 10678.97 samples/sec Loss 4.2111 LearningRate 0.0007 Epoch: 36 Global Step: 208610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:51,118-Speed 11124.01 samples/sec Loss 4.0932 LearningRate 0.0007 Epoch: 36 Global Step: 208620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:52,079-Speed 10665.28 samples/sec Loss 4.1538 LearningRate 0.0007 Epoch: 36 Global Step: 208630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:53,030-Speed 10771.85 samples/sec Loss 4.1091 LearningRate 0.0007 Epoch: 36 Global Step: 208640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:53,941-Speed 11244.69 samples/sec Loss 4.1347 LearningRate 0.0007 Epoch: 36 Global Step: 208650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:54,852-Speed 11255.39 samples/sec Loss 4.1271 LearningRate 0.0007 Epoch: 36 Global Step: 208660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:55,764-Speed 11227.19 samples/sec Loss 4.1721 LearningRate 0.0007 Epoch: 36 Global Step: 208670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:56,679-Speed 11211.90 samples/sec Loss 4.1659 LearningRate 0.0007 Epoch: 36 Global Step: 208680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:39:57,571-Speed 11480.12 samples/sec Loss 4.0932 LearningRate 0.0007 Epoch: 36 Global Step: 208690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:58,475-Speed 11347.10 samples/sec Loss 4.1047 LearningRate 0.0007 Epoch: 36 Global Step: 208700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:39:59,407-Speed 10988.26 samples/sec Loss 4.0825 LearningRate 0.0007 Epoch: 36 Global Step: 208710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:00,368-Speed 10666.29 samples/sec Loss 4.0278 LearningRate 0.0007 Epoch: 36 Global Step: 208720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:01,298-Speed 11009.91 samples/sec Loss 4.1490 LearningRate 0.0007 Epoch: 36 Global Step: 208730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:02,253-Speed 10732.37 samples/sec Loss 4.1422 LearningRate 0.0007 Epoch: 36 Global Step: 208740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:03,234-Speed 10448.31 samples/sec Loss 4.0384 LearningRate 0.0007 Epoch: 36 Global Step: 208750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:04,186-Speed 10765.29 samples/sec Loss 3.9598 LearningRate 0.0007 Epoch: 36 Global Step: 208760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:05,130-Speed 10856.43 samples/sec Loss 4.1863 LearningRate 0.0007 Epoch: 36 Global Step: 208770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:06,119-Speed 10358.83 samples/sec Loss 4.1068 LearningRate 0.0007 Epoch: 36 Global Step: 208780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:07,048-Speed 11026.90 samples/sec Loss 4.1610 LearningRate 0.0007 Epoch: 36 Global Step: 208790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:08,022-Speed 10524.95 samples/sec Loss 4.1424 LearningRate 0.0007 Epoch: 36 Global Step: 208800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:08,928-Speed 11307.94 samples/sec Loss 4.0513 LearningRate 0.0007 Epoch: 36 Global Step: 208810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:09,850-Speed 11107.22 samples/sec Loss 4.1305 LearningRate 0.0007 Epoch: 36 Global Step: 208820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:10,777-Speed 11052.61 samples/sec Loss 4.2126 LearningRate 0.0007 Epoch: 36 Global Step: 208830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:11,731-Speed 10743.03 samples/sec Loss 4.1869 LearningRate 0.0007 Epoch: 36 Global Step: 208840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:12,639-Speed 11285.94 samples/sec Loss 4.1113 LearningRate 0.0007 Epoch: 36 Global Step: 208850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:13,567-Speed 11051.50 samples/sec Loss 4.0611 LearningRate 0.0007 Epoch: 36 Global Step: 208860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:14,549-Speed 10432.79 samples/sec Loss 4.1513 LearningRate 0.0007 Epoch: 36 Global Step: 208870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:15,460-Speed 11242.09 samples/sec Loss 4.0419 LearningRate 0.0007 Epoch: 36 Global Step: 208880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:16,372-Speed 11238.65 samples/sec Loss 4.1612 LearningRate 0.0007 Epoch: 36 Global Step: 208890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:17,305-Speed 10977.33 samples/sec Loss 4.0991 LearningRate 0.0007 Epoch: 36 Global Step: 208900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:18,252-Speed 10827.73 samples/sec Loss 3.9441 LearningRate 0.0007 Epoch: 36 Global Step: 208910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:19,231-Speed 10471.66 samples/sec Loss 4.0359 LearningRate 0.0007 Epoch: 36 Global Step: 208920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:20,161-Speed 11018.71 samples/sec Loss 4.0192 LearningRate 0.0007 Epoch: 36 Global Step: 208930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:21,080-Speed 11146.94 samples/sec Loss 4.0234 LearningRate 0.0007 Epoch: 36 Global Step: 208940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:21,989-Speed 11279.29 samples/sec Loss 4.1167 LearningRate 0.0007 Epoch: 36 Global Step: 208950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:22,909-Speed 11133.37 samples/sec Loss 4.0682 LearningRate 0.0007 Epoch: 36 Global Step: 208960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:23,832-Speed 11097.52 samples/sec Loss 4.1850 LearningRate 0.0007 Epoch: 36 Global Step: 208970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:24,782-Speed 10783.49 samples/sec Loss 4.0435 LearningRate 0.0007 Epoch: 36 Global Step: 208980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:25,703-Speed 11124.78 samples/sec Loss 4.2743 LearningRate 0.0007 Epoch: 36 Global Step: 208990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:26,670-Speed 10595.30 samples/sec Loss 4.0989 LearningRate 0.0007 Epoch: 36 Global Step: 209000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:27,616-Speed 10849.02 samples/sec Loss 4.1140 LearningRate 0.0007 Epoch: 36 Global Step: 209010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:28,554-Speed 10922.44 samples/sec Loss 4.1446 LearningRate 0.0007 Epoch: 36 Global Step: 209020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:29,460-Speed 11305.70 samples/sec Loss 4.0953 LearningRate 0.0007 Epoch: 36 Global Step: 209030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:30,389-Speed 11035.56 samples/sec Loss 4.0881 LearningRate 0.0007 Epoch: 36 Global Step: 209040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:31,319-Speed 11014.78 samples/sec Loss 4.0898 LearningRate 0.0007 Epoch: 36 Global Step: 209050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:32,223-Speed 11344.97 samples/sec Loss 4.2307 LearningRate 0.0007 Epoch: 36 Global Step: 209060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:33,122-Speed 11397.14 samples/sec Loss 4.1833 LearningRate 0.0007 Epoch: 36 Global Step: 209070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:34,040-Speed 11160.44 samples/sec Loss 4.1239 LearningRate 0.0007 Epoch: 36 Global Step: 209080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:34,975-Speed 10950.47 samples/sec Loss 4.1103 LearningRate 0.0007 Epoch: 36 Global Step: 209090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:35,978-Speed 10216.35 samples/sec Loss 4.1781 LearningRate 0.0007 Epoch: 36 Global Step: 209100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:36,896-Speed 11173.74 samples/sec Loss 4.1130 LearningRate 0.0006 Epoch: 36 Global Step: 209110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:37,865-Speed 10571.42 samples/sec Loss 4.2350 LearningRate 0.0006 Epoch: 36 Global Step: 209120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:38,802-Speed 10940.28 samples/sec Loss 4.2930 LearningRate 0.0006 Epoch: 36 Global Step: 209130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:39,738-Speed 10943.26 samples/sec Loss 4.1898 LearningRate 0.0006 Epoch: 36 Global Step: 209140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:40,678-Speed 10909.50 samples/sec Loss 4.1269 LearningRate 0.0006 Epoch: 36 Global Step: 209150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:41,637-Speed 10682.85 samples/sec Loss 4.1503 LearningRate 0.0006 Epoch: 36 Global Step: 209160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:42,602-Speed 10620.14 samples/sec Loss 4.1441 LearningRate 0.0006 Epoch: 36 Global Step: 209170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:43,549-Speed 10817.56 samples/sec Loss 4.1029 LearningRate 0.0006 Epoch: 36 Global Step: 209180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:44,506-Speed 10708.44 samples/sec Loss 4.0793 LearningRate 0.0006 Epoch: 36 Global Step: 209190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:45,428-Speed 11119.12 samples/sec Loss 4.0872 LearningRate 0.0006 Epoch: 36 Global Step: 209200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:46,326-Speed 11402.63 samples/sec Loss 4.0863 LearningRate 0.0006 Epoch: 36 Global Step: 209210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:47,273-Speed 10826.73 samples/sec Loss 4.0963 LearningRate 0.0006 Epoch: 36 Global Step: 209220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:48,223-Speed 10785.40 samples/sec Loss 4.1150 LearningRate 0.0006 Epoch: 36 Global Step: 209230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:49,201-Speed 10479.75 samples/sec Loss 4.2143 LearningRate 0.0006 Epoch: 36 Global Step: 209240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:50,189-Speed 10361.98 samples/sec Loss 4.0675 LearningRate 0.0006 Epoch: 36 Global Step: 209250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:51,119-Speed 11023.07 samples/sec Loss 4.1208 LearningRate 0.0006 Epoch: 36 Global Step: 209260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:52,055-Speed 10948.37 samples/sec Loss 4.1991 LearningRate 0.0006 Epoch: 36 Global Step: 209270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:53,046-Speed 10343.32 samples/sec Loss 4.0803 LearningRate 0.0006 Epoch: 36 Global Step: 209280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:54,004-Speed 10696.65 samples/sec Loss 4.0076 LearningRate 0.0006 Epoch: 36 Global Step: 209290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:54,925-Speed 11116.52 samples/sec Loss 4.1034 LearningRate 0.0006 Epoch: 36 Global Step: 209300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:55,858-Speed 10983.61 samples/sec Loss 4.0898 LearningRate 0.0006 Epoch: 36 Global Step: 209310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:56,773-Speed 11201.68 samples/sec Loss 4.0973 LearningRate 0.0006 Epoch: 36 Global Step: 209320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:40:57,673-Speed 11386.94 samples/sec Loss 4.1995 LearningRate 0.0006 Epoch: 36 Global Step: 209330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:58,619-Speed 10828.99 samples/sec Loss 3.9807 LearningRate 0.0006 Epoch: 36 Global Step: 209340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:40:59,576-Speed 10703.11 samples/sec Loss 4.0689 LearningRate 0.0006 Epoch: 36 Global Step: 209350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:00,567-Speed 10344.01 samples/sec Loss 4.1057 LearningRate 0.0006 Epoch: 36 Global Step: 209360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:01,508-Speed 10884.51 samples/sec Loss 4.1681 LearningRate 0.0006 Epoch: 36 Global Step: 209370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:02,460-Speed 10773.14 samples/sec Loss 4.0992 LearningRate 0.0006 Epoch: 36 Global Step: 209380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:03,425-Speed 10622.39 samples/sec Loss 4.0490 LearningRate 0.0006 Epoch: 36 Global Step: 209390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:04,392-Speed 10589.34 samples/sec Loss 4.1696 LearningRate 0.0006 Epoch: 36 Global Step: 209400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:05,300-Speed 11291.41 samples/sec Loss 4.0077 LearningRate 0.0006 Epoch: 36 Global Step: 209410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:06,212-Speed 11231.22 samples/sec Loss 4.0145 LearningRate 0.0006 Epoch: 36 Global Step: 209420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:07,132-Speed 11148.89 samples/sec Loss 4.0451 LearningRate 0.0006 Epoch: 36 Global Step: 209430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:08,071-Speed 10911.56 samples/sec Loss 4.0087 LearningRate 0.0006 Epoch: 36 Global Step: 209440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:09,007-Speed 10944.08 samples/sec Loss 4.0886 LearningRate 0.0006 Epoch: 36 Global Step: 209450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:09,943-Speed 10949.73 samples/sec Loss 4.0835 LearningRate 0.0006 Epoch: 36 Global Step: 209460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:10,909-Speed 10608.38 samples/sec Loss 4.1827 LearningRate 0.0006 Epoch: 36 Global Step: 209470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:11,835-Speed 11070.19 samples/sec Loss 4.1987 LearningRate 0.0006 Epoch: 36 Global Step: 209480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:12,790-Speed 10720.55 samples/sec Loss 4.1557 LearningRate 0.0006 Epoch: 36 Global Step: 209490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:13,742-Speed 10764.07 samples/sec Loss 4.1431 LearningRate 0.0006 Epoch: 36 Global Step: 209500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:14,697-Speed 10731.02 samples/sec Loss 4.1430 LearningRate 0.0006 Epoch: 36 Global Step: 209510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:15,626-Speed 11029.99 samples/sec Loss 4.0988 LearningRate 0.0006 Epoch: 36 Global Step: 209520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:16,571-Speed 10844.87 samples/sec Loss 4.1328 LearningRate 0.0006 Epoch: 36 Global Step: 209530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:17,498-Speed 11058.82 samples/sec Loss 4.0197 LearningRate 0.0006 Epoch: 36 Global Step: 209540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:18,425-Speed 11049.12 samples/sec Loss 4.1363 LearningRate 0.0006 Epoch: 36 Global Step: 209550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:19,381-Speed 10711.33 samples/sec Loss 4.1455 LearningRate 0.0006 Epoch: 36 Global Step: 209560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:20,308-Speed 11055.90 samples/sec Loss 4.1861 LearningRate 0.0006 Epoch: 36 Global Step: 209570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:21,244-Speed 10950.63 samples/sec Loss 4.0484 LearningRate 0.0006 Epoch: 36 Global Step: 209580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:22,161-Speed 11175.16 samples/sec Loss 4.1785 LearningRate 0.0006 Epoch: 36 Global Step: 209590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:23,090-Speed 11031.16 samples/sec Loss 4.0487 LearningRate 0.0006 Epoch: 36 Global Step: 209600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:24,022-Speed 10996.55 samples/sec Loss 4.1317 LearningRate 0.0006 Epoch: 36 Global Step: 209610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:24,920-Speed 11412.82 samples/sec Loss 4.2015 LearningRate 0.0006 Epoch: 36 Global Step: 209620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:25,858-Speed 10927.84 samples/sec Loss 4.1117 LearningRate 0.0006 Epoch: 36 Global Step: 209630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:26,819-Speed 10663.92 samples/sec Loss 4.1963 LearningRate 0.0006 Epoch: 36 Global Step: 209640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:27,754-Speed 10958.58 samples/sec Loss 4.1471 LearningRate 0.0006 Epoch: 36 Global Step: 209650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:28,708-Speed 10735.08 samples/sec Loss 4.0862 LearningRate 0.0006 Epoch: 36 Global Step: 209660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:29,642-Speed 10977.25 samples/sec Loss 4.2261 LearningRate 0.0006 Epoch: 36 Global Step: 209670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:30,574-Speed 10987.84 samples/sec Loss 4.0947 LearningRate 0.0006 Epoch: 36 Global Step: 209680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:31,484-Speed 11258.85 samples/sec Loss 4.1699 LearningRate 0.0006 Epoch: 36 Global Step: 209690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:32,462-Speed 10490.14 samples/sec Loss 3.9906 LearningRate 0.0006 Epoch: 36 Global Step: 209700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:33,366-Speed 11333.30 samples/sec Loss 4.1931 LearningRate 0.0006 Epoch: 36 Global Step: 209710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:41:34,265-Speed 11393.59 samples/sec Loss 4.1433 LearningRate 0.0006 Epoch: 36 Global Step: 209720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:35,195-Speed 11020.16 samples/sec Loss 4.1307 LearningRate 0.0006 Epoch: 36 Global Step: 209730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:36,120-Speed 11075.21 samples/sec Loss 4.1825 LearningRate 0.0006 Epoch: 36 Global Step: 209740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:37,061-Speed 10888.75 samples/sec Loss 4.0663 LearningRate 0.0006 Epoch: 36 Global Step: 209750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:38,010-Speed 10796.47 samples/sec Loss 4.0462 LearningRate 0.0006 Epoch: 36 Global Step: 209760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:39,010-Speed 10250.11 samples/sec Loss 4.1104 LearningRate 0.0006 Epoch: 36 Global Step: 209770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:39,983-Speed 10531.63 samples/sec Loss 4.1549 LearningRate 0.0006 Epoch: 36 Global Step: 209780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:40,912-Speed 11023.40 samples/sec Loss 3.9774 LearningRate 0.0006 Epoch: 36 Global Step: 209790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:41,870-Speed 10700.37 samples/sec Loss 4.1351 LearningRate 0.0006 Epoch: 36 Global Step: 209800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:42,807-Speed 10941.18 samples/sec Loss 4.1448 LearningRate 0.0006 Epoch: 36 Global Step: 209810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:43,749-Speed 10878.15 samples/sec Loss 4.0618 LearningRate 0.0006 Epoch: 36 Global Step: 209820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:44,681-Speed 10994.14 samples/sec Loss 4.1743 LearningRate 0.0006 Epoch: 36 Global Step: 209830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:45,619-Speed 10915.71 samples/sec Loss 4.0557 LearningRate 0.0006 Epoch: 36 Global Step: 209840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:46,557-Speed 10928.57 samples/sec Loss 4.1873 LearningRate 0.0006 Epoch: 36 Global Step: 209850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:47,873-Speed 7787.18 samples/sec Loss 4.0354 LearningRate 0.0006 Epoch: 36 Global Step: 209860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:48,835-Speed 10656.52 samples/sec Loss 4.0939 LearningRate 0.0006 Epoch: 36 Global Step: 209870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:50,103-Speed 8079.21 samples/sec Loss 4.1533 LearningRate 0.0006 Epoch: 36 Global Step: 209880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:51,061-Speed 10690.36 samples/sec Loss 4.0772 LearningRate 0.0006 Epoch: 36 Global Step: 209890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:52,039-Speed 10477.00 samples/sec Loss 4.0819 LearningRate 0.0006 Epoch: 36 Global Step: 209900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:52,994-Speed 10732.67 samples/sec Loss 4.1162 LearningRate 0.0006 Epoch: 36 Global Step: 209910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:54,257-Speed 8112.89 samples/sec Loss 4.1823 LearningRate 0.0006 Epoch: 36 Global Step: 209920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:55,176-Speed 11150.04 samples/sec Loss 4.0562 LearningRate 0.0006 Epoch: 36 Global Step: 209930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:56,469-Speed 7919.14 samples/sec Loss 4.0108 LearningRate 0.0006 Epoch: 36 Global Step: 209940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:57,687-Speed 8417.22 samples/sec Loss 4.0490 LearningRate 0.0006 Epoch: 36 Global Step: 209950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:41:58,635-Speed 10802.87 samples/sec Loss 4.1968 LearningRate 0.0006 Epoch: 36 Global Step: 209960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:41:59,565-Speed 11022.43 samples/sec Loss 4.2375 LearningRate 0.0006 Epoch: 36 Global Step: 209970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:42:00,475-Speed 11253.73 samples/sec Loss 4.0192 LearningRate 0.0006 Epoch: 36 Global Step: 209980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:42:02,210-Speed 5904.83 samples/sec Loss 4.3099 LearningRate 0.0006 Epoch: 36 Global Step: 209990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:42:03,121-Speed 11252.73 samples/sec Loss 4.0710 LearningRate 0.0006 Epoch: 36 Global Step: 210000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:42:24,372-[lfw][210000]XNorm: 7.642428 Training: 2022-04-11 06:42:24,373-[lfw][210000]Accuracy-Flip: 0.99700+-0.00256 Training: 2022-04-11 06:42:24,373-[lfw][210000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:42:49,015-[cfp_fp][210000]XNorm: 6.545778 Training: 2022-04-11 06:42:49,016-[cfp_fp][210000]Accuracy-Flip: 0.95743+-0.01038 Training: 2022-04-11 06:42:49,017-[cfp_fp][210000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:43:10,296-[agedb_30][210000]XNorm: 7.495818 Training: 2022-04-11 06:43:10,297-[agedb_30][210000]Accuracy-Flip: 0.97000+-0.00734 Training: 2022-04-11 06:43:10,297-[agedb_30][210000]Accuracy-Highest: 0.97167 Training: 2022-04-11 06:43:11,247-Speed 150.31 samples/sec Loss 4.1409 LearningRate 0.0006 Epoch: 36 Global Step: 210010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:12,220-Speed 10526.31 samples/sec Loss 4.1577 LearningRate 0.0006 Epoch: 36 Global Step: 210020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:13,189-Speed 10578.30 samples/sec Loss 4.1939 LearningRate 0.0006 Epoch: 36 Global Step: 210030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:14,155-Speed 10605.68 samples/sec Loss 4.0471 LearningRate 0.0006 Epoch: 36 Global Step: 210040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:15,077-Speed 11117.28 samples/sec Loss 4.0691 LearningRate 0.0006 Epoch: 36 Global Step: 210050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:16,054-Speed 10485.50 samples/sec Loss 4.1426 LearningRate 0.0006 Epoch: 36 Global Step: 210060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:17,043-Speed 10361.03 samples/sec Loss 4.1477 LearningRate 0.0006 Epoch: 36 Global Step: 210070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:17,960-Speed 11174.99 samples/sec Loss 4.0504 LearningRate 0.0006 Epoch: 36 Global Step: 210080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:18,925-Speed 10619.44 samples/sec Loss 4.1908 LearningRate 0.0006 Epoch: 36 Global Step: 210090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:19,896-Speed 10557.10 samples/sec Loss 4.2151 LearningRate 0.0006 Epoch: 36 Global Step: 210100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:20,819-Speed 11096.35 samples/sec Loss 3.9178 LearningRate 0.0006 Epoch: 36 Global Step: 210110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:21,719-Speed 11384.52 samples/sec Loss 4.0833 LearningRate 0.0006 Epoch: 36 Global Step: 210120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:22,623-Speed 11347.23 samples/sec Loss 4.0787 LearningRate 0.0006 Epoch: 36 Global Step: 210130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:23,554-Speed 10999.64 samples/sec Loss 4.0414 LearningRate 0.0006 Epoch: 36 Global Step: 210140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:24,510-Speed 10718.95 samples/sec Loss 4.1773 LearningRate 0.0006 Epoch: 36 Global Step: 210150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:25,464-Speed 10745.56 samples/sec Loss 4.1064 LearningRate 0.0006 Epoch: 36 Global Step: 210160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:26,399-Speed 10955.61 samples/sec Loss 4.0896 LearningRate 0.0006 Epoch: 36 Global Step: 210170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:27,338-Speed 10909.87 samples/sec Loss 4.3382 LearningRate 0.0006 Epoch: 36 Global Step: 210180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:28,298-Speed 10674.79 samples/sec Loss 4.1330 LearningRate 0.0006 Epoch: 36 Global Step: 210190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:29,298-Speed 10245.99 samples/sec Loss 4.1577 LearningRate 0.0006 Epoch: 36 Global Step: 210200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:30,255-Speed 10711.45 samples/sec Loss 4.0995 LearningRate 0.0006 Epoch: 36 Global Step: 210210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:31,238-Speed 10419.87 samples/sec Loss 4.1378 LearningRate 0.0006 Epoch: 36 Global Step: 210220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:32,231-Speed 10318.02 samples/sec Loss 4.2779 LearningRate 0.0006 Epoch: 36 Global Step: 210230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:33,141-Speed 11268.27 samples/sec Loss 4.1210 LearningRate 0.0006 Epoch: 36 Global Step: 210240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:34,053-Speed 11236.28 samples/sec Loss 4.2070 LearningRate 0.0006 Epoch: 36 Global Step: 210250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:34,971-Speed 11159.85 samples/sec Loss 4.2312 LearningRate 0.0006 Epoch: 36 Global Step: 210260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:35,914-Speed 10867.72 samples/sec Loss 4.1682 LearningRate 0.0006 Epoch: 36 Global Step: 210270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:36,860-Speed 10830.24 samples/sec Loss 4.1037 LearningRate 0.0006 Epoch: 36 Global Step: 210280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:37,792-Speed 10998.10 samples/sec Loss 4.1780 LearningRate 0.0006 Epoch: 36 Global Step: 210290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:38,717-Speed 11069.68 samples/sec Loss 4.1460 LearningRate 0.0006 Epoch: 36 Global Step: 210300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:39,650-Speed 10984.11 samples/sec Loss 4.0864 LearningRate 0.0006 Epoch: 36 Global Step: 210310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:40,608-Speed 10693.93 samples/sec Loss 4.1634 LearningRate 0.0006 Epoch: 36 Global Step: 210320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:41,548-Speed 10901.59 samples/sec Loss 4.1127 LearningRate 0.0006 Epoch: 36 Global Step: 210330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:42,523-Speed 10508.51 samples/sec Loss 4.1892 LearningRate 0.0006 Epoch: 36 Global Step: 210340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:43,524-Speed 10240.54 samples/sec Loss 4.1182 LearningRate 0.0006 Epoch: 36 Global Step: 210350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:43:44,442-Speed 11164.35 samples/sec Loss 4.1803 LearningRate 0.0006 Epoch: 36 Global Step: 210360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:45,355-Speed 11220.10 samples/sec Loss 4.1823 LearningRate 0.0006 Epoch: 36 Global Step: 210370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:46,320-Speed 10625.89 samples/sec Loss 4.1337 LearningRate 0.0006 Epoch: 36 Global Step: 210380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:57,410-Speed 923.49 samples/sec Loss 4.0870 LearningRate 0.0006 Epoch: 37 Global Step: 210390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:43:58,394-Speed 10416.35 samples/sec Loss 3.9952 LearningRate 0.0006 Epoch: 37 Global Step: 210400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:43:59,387-Speed 10318.22 samples/sec Loss 4.0354 LearningRate 0.0006 Epoch: 37 Global Step: 210410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:00,371-Speed 10415.98 samples/sec Loss 3.9923 LearningRate 0.0006 Epoch: 37 Global Step: 210420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:01,308-Speed 10936.04 samples/sec Loss 4.0511 LearningRate 0.0006 Epoch: 37 Global Step: 210430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:02,349-Speed 9837.11 samples/sec Loss 4.0738 LearningRate 0.0006 Epoch: 37 Global Step: 210440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:03,314-Speed 10621.04 samples/sec Loss 3.9886 LearningRate 0.0006 Epoch: 37 Global Step: 210450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:04,293-Speed 10466.37 samples/sec Loss 4.0305 LearningRate 0.0006 Epoch: 37 Global Step: 210460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:05,236-Speed 10866.23 samples/sec Loss 4.0516 LearningRate 0.0006 Epoch: 37 Global Step: 210470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:06,194-Speed 10697.01 samples/sec Loss 3.9867 LearningRate 0.0006 Epoch: 37 Global Step: 210480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:07,164-Speed 10564.73 samples/sec Loss 3.9801 LearningRate 0.0006 Epoch: 37 Global Step: 210490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:44:08,114-Speed 10794.00 samples/sec Loss 3.9592 LearningRate 0.0006 Epoch: 37 Global Step: 210500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:09,034-Speed 11138.84 samples/sec Loss 3.9441 LearningRate 0.0006 Epoch: 37 Global Step: 210510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:09,963-Speed 11029.05 samples/sec Loss 3.9165 LearningRate 0.0006 Epoch: 37 Global Step: 210520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:10,904-Speed 10887.04 samples/sec Loss 3.9358 LearningRate 0.0006 Epoch: 37 Global Step: 210530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:11,860-Speed 10727.22 samples/sec Loss 3.9533 LearningRate 0.0006 Epoch: 37 Global Step: 210540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:12,826-Speed 10598.94 samples/sec Loss 4.0551 LearningRate 0.0006 Epoch: 37 Global Step: 210550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:13,842-Speed 10084.20 samples/sec Loss 3.9978 LearningRate 0.0006 Epoch: 37 Global Step: 210560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:14,763-Speed 11128.88 samples/sec Loss 4.0067 LearningRate 0.0006 Epoch: 37 Global Step: 210570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:15,768-Speed 10198.20 samples/sec Loss 4.0902 LearningRate 0.0005 Epoch: 37 Global Step: 210580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:16,757-Speed 10365.61 samples/sec Loss 4.0884 LearningRate 0.0005 Epoch: 37 Global Step: 210590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:17,724-Speed 10596.33 samples/sec Loss 3.9710 LearningRate 0.0005 Epoch: 37 Global Step: 210600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:18,686-Speed 10653.69 samples/sec Loss 3.9710 LearningRate 0.0005 Epoch: 37 Global Step: 210610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:19,652-Speed 10605.00 samples/sec Loss 3.9250 LearningRate 0.0005 Epoch: 37 Global Step: 210620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:20,599-Speed 10819.70 samples/sec Loss 4.0621 LearningRate 0.0005 Epoch: 37 Global Step: 210630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:21,550-Speed 10777.96 samples/sec Loss 4.1065 LearningRate 0.0005 Epoch: 37 Global Step: 210640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:22,461-Speed 11249.53 samples/sec Loss 4.1298 LearningRate 0.0005 Epoch: 37 Global Step: 210650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:23,415-Speed 10738.39 samples/sec Loss 4.0124 LearningRate 0.0005 Epoch: 37 Global Step: 210660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:24,367-Speed 10761.72 samples/sec Loss 3.9864 LearningRate 0.0005 Epoch: 37 Global Step: 210670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:25,339-Speed 10543.40 samples/sec Loss 4.0128 LearningRate 0.0005 Epoch: 37 Global Step: 210680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:26,297-Speed 10697.02 samples/sec Loss 3.8439 LearningRate 0.0005 Epoch: 37 Global Step: 210690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:27,293-Speed 10287.63 samples/sec Loss 4.1225 LearningRate 0.0005 Epoch: 37 Global Step: 210700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:28,292-Speed 10257.94 samples/sec Loss 4.0806 LearningRate 0.0005 Epoch: 37 Global Step: 210710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:29,239-Speed 10830.90 samples/sec Loss 4.1240 LearningRate 0.0005 Epoch: 37 Global Step: 210720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:30,194-Speed 10724.22 samples/sec Loss 3.9230 LearningRate 0.0005 Epoch: 37 Global Step: 210730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:31,152-Speed 10699.30 samples/sec Loss 4.0059 LearningRate 0.0005 Epoch: 37 Global Step: 210740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:32,099-Speed 10818.68 samples/sec Loss 4.0423 LearningRate 0.0005 Epoch: 37 Global Step: 210750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:33,019-Speed 11139.88 samples/sec Loss 4.0359 LearningRate 0.0005 Epoch: 37 Global Step: 210760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:33,937-Speed 11165.19 samples/sec Loss 3.8926 LearningRate 0.0005 Epoch: 37 Global Step: 210770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:34,868-Speed 11006.58 samples/sec Loss 4.0492 LearningRate 0.0005 Epoch: 37 Global Step: 210780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:35,820-Speed 10763.40 samples/sec Loss 3.9380 LearningRate 0.0005 Epoch: 37 Global Step: 210790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:36,753-Speed 10975.84 samples/sec Loss 4.0044 LearningRate 0.0005 Epoch: 37 Global Step: 210800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:37,695-Speed 10881.24 samples/sec Loss 4.1379 LearningRate 0.0005 Epoch: 37 Global Step: 210810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:38,631-Speed 10953.16 samples/sec Loss 3.8966 LearningRate 0.0005 Epoch: 37 Global Step: 210820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:39,549-Speed 11162.71 samples/sec Loss 3.9612 LearningRate 0.0005 Epoch: 37 Global Step: 210830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:40,556-Speed 10172.88 samples/sec Loss 3.9223 LearningRate 0.0005 Epoch: 37 Global Step: 210840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:41,484-Speed 11036.57 samples/sec Loss 3.9811 LearningRate 0.0005 Epoch: 37 Global Step: 210850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:42,438-Speed 10749.70 samples/sec Loss 4.0805 LearningRate 0.0005 Epoch: 37 Global Step: 210860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:43,406-Speed 10577.24 samples/sec Loss 4.0590 LearningRate 0.0005 Epoch: 37 Global Step: 210870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:44,319-Speed 11228.07 samples/sec Loss 3.9308 LearningRate 0.0005 Epoch: 37 Global Step: 210880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:45,250-Speed 11010.48 samples/sec Loss 4.0130 LearningRate 0.0005 Epoch: 37 Global Step: 210890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:46,161-Speed 11248.81 samples/sec Loss 4.0086 LearningRate 0.0005 Epoch: 37 Global Step: 210900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:47,065-Speed 11336.93 samples/sec Loss 4.0062 LearningRate 0.0005 Epoch: 37 Global Step: 210910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:47,997-Speed 10998.37 samples/sec Loss 3.9710 LearningRate 0.0005 Epoch: 37 Global Step: 210920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:48,928-Speed 10997.07 samples/sec Loss 4.0661 LearningRate 0.0005 Epoch: 37 Global Step: 210930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:49,845-Speed 11175.91 samples/sec Loss 3.8963 LearningRate 0.0005 Epoch: 37 Global Step: 210940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:50,805-Speed 10672.68 samples/sec Loss 4.1757 LearningRate 0.0005 Epoch: 37 Global Step: 210950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:51,768-Speed 10641.43 samples/sec Loss 4.0159 LearningRate 0.0005 Epoch: 37 Global Step: 210960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:52,741-Speed 10537.55 samples/sec Loss 3.9741 LearningRate 0.0005 Epoch: 37 Global Step: 210970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:53,665-Speed 11085.15 samples/sec Loss 3.8948 LearningRate 0.0005 Epoch: 37 Global Step: 210980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:54,630-Speed 10623.32 samples/sec Loss 4.0420 LearningRate 0.0005 Epoch: 37 Global Step: 210990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:44:55,568-Speed 10917.40 samples/sec Loss 4.0850 LearningRate 0.0005 Epoch: 37 Global Step: 211000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:56,494-Speed 11064.24 samples/sec Loss 4.0008 LearningRate 0.0005 Epoch: 37 Global Step: 211010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:57,404-Speed 11260.77 samples/sec Loss 3.9238 LearningRate 0.0005 Epoch: 37 Global Step: 211020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:58,320-Speed 11195.14 samples/sec Loss 4.0932 LearningRate 0.0005 Epoch: 37 Global Step: 211030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:44:59,266-Speed 10828.39 samples/sec Loss 4.0937 LearningRate 0.0005 Epoch: 37 Global Step: 211040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:00,185-Speed 11156.33 samples/sec Loss 3.9007 LearningRate 0.0005 Epoch: 37 Global Step: 211050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:01,112-Speed 11050.11 samples/sec Loss 3.9883 LearningRate 0.0005 Epoch: 37 Global Step: 211060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:02,051-Speed 10914.47 samples/sec Loss 4.0537 LearningRate 0.0005 Epoch: 37 Global Step: 211070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:03,021-Speed 10562.24 samples/sec Loss 3.9442 LearningRate 0.0005 Epoch: 37 Global Step: 211080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:03,997-Speed 10498.62 samples/sec Loss 4.0323 LearningRate 0.0005 Epoch: 37 Global Step: 211090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:04,913-Speed 11193.61 samples/sec Loss 4.1183 LearningRate 0.0005 Epoch: 37 Global Step: 211100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:05,818-Speed 11311.41 samples/sec Loss 4.0414 LearningRate 0.0005 Epoch: 37 Global Step: 211110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:06,735-Speed 11180.82 samples/sec Loss 4.0425 LearningRate 0.0005 Epoch: 37 Global Step: 211120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:07,658-Speed 11099.78 samples/sec Loss 3.9342 LearningRate 0.0005 Epoch: 37 Global Step: 211130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:08,577-Speed 11155.66 samples/sec Loss 4.0874 LearningRate 0.0005 Epoch: 37 Global Step: 211140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:09,488-Speed 11245.53 samples/sec Loss 4.0140 LearningRate 0.0005 Epoch: 37 Global Step: 211150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:10,419-Speed 11014.86 samples/sec Loss 3.9797 LearningRate 0.0005 Epoch: 37 Global Step: 211160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:11,337-Speed 11163.46 samples/sec Loss 4.1401 LearningRate 0.0005 Epoch: 37 Global Step: 211170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:12,289-Speed 10758.03 samples/sec Loss 3.9798 LearningRate 0.0005 Epoch: 37 Global Step: 211180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:13,188-Speed 11400.40 samples/sec Loss 3.9071 LearningRate 0.0005 Epoch: 37 Global Step: 211190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:14,113-Speed 11079.15 samples/sec Loss 4.1331 LearningRate 0.0005 Epoch: 37 Global Step: 211200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:15,025-Speed 11231.31 samples/sec Loss 4.1038 LearningRate 0.0005 Epoch: 37 Global Step: 211210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:15,958-Speed 10981.16 samples/sec Loss 4.0841 LearningRate 0.0005 Epoch: 37 Global Step: 211220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:16,885-Speed 11059.23 samples/sec Loss 3.9177 LearningRate 0.0005 Epoch: 37 Global Step: 211230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:17,806-Speed 11126.37 samples/sec Loss 3.9877 LearningRate 0.0005 Epoch: 37 Global Step: 211240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:18,740-Speed 10965.67 samples/sec Loss 4.0458 LearningRate 0.0005 Epoch: 37 Global Step: 211250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:19,692-Speed 10762.10 samples/sec Loss 4.0243 LearningRate 0.0005 Epoch: 37 Global Step: 211260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:20,596-Speed 11337.02 samples/sec Loss 3.9874 LearningRate 0.0005 Epoch: 37 Global Step: 211270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:21,507-Speed 11253.88 samples/sec Loss 4.1095 LearningRate 0.0005 Epoch: 37 Global Step: 211280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:22,411-Speed 11338.26 samples/sec Loss 4.0717 LearningRate 0.0005 Epoch: 37 Global Step: 211290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:23,323-Speed 11224.29 samples/sec Loss 3.9465 LearningRate 0.0005 Epoch: 37 Global Step: 211300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:24,258-Speed 10969.09 samples/sec Loss 4.0903 LearningRate 0.0005 Epoch: 37 Global Step: 211310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:25,227-Speed 10567.92 samples/sec Loss 3.9595 LearningRate 0.0005 Epoch: 37 Global Step: 211320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:26,150-Speed 11106.89 samples/sec Loss 4.0384 LearningRate 0.0005 Epoch: 37 Global Step: 211330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:27,035-Speed 11574.94 samples/sec Loss 4.1890 LearningRate 0.0005 Epoch: 37 Global Step: 211340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:27,972-Speed 10942.80 samples/sec Loss 3.9066 LearningRate 0.0005 Epoch: 37 Global Step: 211350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:28,892-Speed 11134.73 samples/sec Loss 4.1086 LearningRate 0.0005 Epoch: 37 Global Step: 211360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:29,840-Speed 10820.30 samples/sec Loss 4.1117 LearningRate 0.0005 Epoch: 37 Global Step: 211370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:30,797-Speed 10712.59 samples/sec Loss 3.9168 LearningRate 0.0005 Epoch: 37 Global Step: 211380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:31,698-Speed 11367.12 samples/sec Loss 4.1473 LearningRate 0.0005 Epoch: 37 Global Step: 211390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:32,669-Speed 10557.74 samples/sec Loss 4.0720 LearningRate 0.0005 Epoch: 37 Global Step: 211400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:33,578-Speed 11269.88 samples/sec Loss 4.0193 LearningRate 0.0005 Epoch: 37 Global Step: 211410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:34,504-Speed 11062.64 samples/sec Loss 3.9566 LearningRate 0.0005 Epoch: 37 Global Step: 211420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:35,454-Speed 10787.78 samples/sec Loss 3.9206 LearningRate 0.0005 Epoch: 37 Global Step: 211430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:36,371-Speed 11171.70 samples/sec Loss 4.1125 LearningRate 0.0005 Epoch: 37 Global Step: 211440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:37,290-Speed 11158.42 samples/sec Loss 4.0284 LearningRate 0.0005 Epoch: 37 Global Step: 211450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:38,255-Speed 10614.84 samples/sec Loss 4.1027 LearningRate 0.0005 Epoch: 37 Global Step: 211460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:45:39,195-Speed 10898.06 samples/sec Loss 3.9937 LearningRate 0.0005 Epoch: 37 Global Step: 211470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:40,113-Speed 11170.32 samples/sec Loss 3.9825 LearningRate 0.0005 Epoch: 37 Global Step: 211480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:41,021-Speed 11282.70 samples/sec Loss 4.0506 LearningRate 0.0005 Epoch: 37 Global Step: 211490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:45:41,937-Speed 11183.40 samples/sec Loss 3.9154 LearningRate 0.0005 Epoch: 37 Global Step: 211500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:42,872-Speed 10965.99 samples/sec Loss 3.9781 LearningRate 0.0005 Epoch: 37 Global Step: 211510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:43,812-Speed 10897.60 samples/sec Loss 4.1417 LearningRate 0.0005 Epoch: 37 Global Step: 211520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:44,742-Speed 11022.30 samples/sec Loss 4.0589 LearningRate 0.0005 Epoch: 37 Global Step: 211530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:45,650-Speed 11288.70 samples/sec Loss 4.0703 LearningRate 0.0005 Epoch: 37 Global Step: 211540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:46,552-Speed 11354.74 samples/sec Loss 4.0047 LearningRate 0.0005 Epoch: 37 Global Step: 211550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:47,468-Speed 11182.45 samples/sec Loss 3.9854 LearningRate 0.0005 Epoch: 37 Global Step: 211560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:48,392-Speed 11089.33 samples/sec Loss 4.0456 LearningRate 0.0005 Epoch: 37 Global Step: 211570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:49,322-Speed 11022.12 samples/sec Loss 4.0094 LearningRate 0.0005 Epoch: 37 Global Step: 211580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:50,252-Speed 11023.23 samples/sec Loss 3.8995 LearningRate 0.0005 Epoch: 37 Global Step: 211590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:51,203-Speed 10772.10 samples/sec Loss 4.0962 LearningRate 0.0005 Epoch: 37 Global Step: 211600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:52,143-Speed 10904.80 samples/sec Loss 3.9801 LearningRate 0.0005 Epoch: 37 Global Step: 211610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:53,111-Speed 10586.77 samples/sec Loss 4.1147 LearningRate 0.0005 Epoch: 37 Global Step: 211620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:54,065-Speed 10745.59 samples/sec Loss 3.9775 LearningRate 0.0005 Epoch: 37 Global Step: 211630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:55,035-Speed 10559.86 samples/sec Loss 4.0595 LearningRate 0.0005 Epoch: 37 Global Step: 211640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:55,980-Speed 10843.45 samples/sec Loss 4.0271 LearningRate 0.0005 Epoch: 37 Global Step: 211650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:56,885-Speed 11328.08 samples/sec Loss 3.9865 LearningRate 0.0005 Epoch: 37 Global Step: 211660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:57,783-Speed 11405.96 samples/sec Loss 3.8955 LearningRate 0.0005 Epoch: 37 Global Step: 211670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:58,699-Speed 11192.46 samples/sec Loss 4.0574 LearningRate 0.0005 Epoch: 37 Global Step: 211680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:45:59,596-Speed 11421.91 samples/sec Loss 3.9716 LearningRate 0.0005 Epoch: 37 Global Step: 211690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:46:00,499-Speed 11346.84 samples/sec Loss 4.0701 LearningRate 0.0005 Epoch: 37 Global Step: 211700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:01,433-Speed 10975.41 samples/sec Loss 4.0994 LearningRate 0.0005 Epoch: 37 Global Step: 211710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:02,404-Speed 10545.17 samples/sec Loss 4.0995 LearningRate 0.0005 Epoch: 37 Global Step: 211720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:03,315-Speed 11249.03 samples/sec Loss 4.0831 LearningRate 0.0005 Epoch: 37 Global Step: 211730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:04,252-Speed 10940.90 samples/sec Loss 3.9841 LearningRate 0.0005 Epoch: 37 Global Step: 211740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:05,177-Speed 11077.00 samples/sec Loss 4.0373 LearningRate 0.0005 Epoch: 37 Global Step: 211750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:06,122-Speed 10840.58 samples/sec Loss 4.0242 LearningRate 0.0005 Epoch: 37 Global Step: 211760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:07,050-Speed 11038.97 samples/sec Loss 4.1052 LearningRate 0.0005 Epoch: 37 Global Step: 211770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:07,953-Speed 11349.24 samples/sec Loss 4.1226 LearningRate 0.0005 Epoch: 37 Global Step: 211780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:08,882-Speed 11028.23 samples/sec Loss 4.0337 LearningRate 0.0005 Epoch: 37 Global Step: 211790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:09,814-Speed 10997.13 samples/sec Loss 3.9019 LearningRate 0.0005 Epoch: 37 Global Step: 211800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:46:10,734-Speed 11142.68 samples/sec Loss 4.0446 LearningRate 0.0005 Epoch: 37 Global Step: 211810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:46:11,679-Speed 10837.71 samples/sec Loss 4.0686 LearningRate 0.0005 Epoch: 37 Global Step: 211820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:12,599-Speed 11137.74 samples/sec Loss 4.0865 LearningRate 0.0005 Epoch: 37 Global Step: 211830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:13,529-Speed 11029.92 samples/sec Loss 3.9964 LearningRate 0.0005 Epoch: 37 Global Step: 211840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:14,514-Speed 10405.04 samples/sec Loss 4.0910 LearningRate 0.0005 Epoch: 37 Global Step: 211850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:15,484-Speed 10560.34 samples/sec Loss 3.9787 LearningRate 0.0005 Epoch: 37 Global Step: 211860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:16,412-Speed 11040.39 samples/sec Loss 4.0946 LearningRate 0.0005 Epoch: 37 Global Step: 211870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:17,341-Speed 11037.45 samples/sec Loss 4.1312 LearningRate 0.0005 Epoch: 37 Global Step: 211880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:18,262-Speed 11122.38 samples/sec Loss 4.1089 LearningRate 0.0005 Epoch: 37 Global Step: 211890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:19,235-Speed 10532.83 samples/sec Loss 4.0016 LearningRate 0.0005 Epoch: 37 Global Step: 211900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:20,200-Speed 10618.45 samples/sec Loss 3.8949 LearningRate 0.0005 Epoch: 37 Global Step: 211910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:21,104-Speed 11340.22 samples/sec Loss 4.0355 LearningRate 0.0005 Epoch: 37 Global Step: 211920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:46:22,004-Speed 11387.45 samples/sec Loss 4.1008 LearningRate 0.0005 Epoch: 37 Global Step: 211930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:22,942-Speed 10927.15 samples/sec Loss 3.9944 LearningRate 0.0005 Epoch: 37 Global Step: 211940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:23,884-Speed 10875.05 samples/sec Loss 4.0914 LearningRate 0.0005 Epoch: 37 Global Step: 211950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:24,822-Speed 10927.93 samples/sec Loss 4.0950 LearningRate 0.0005 Epoch: 37 Global Step: 211960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:25,768-Speed 10828.17 samples/sec Loss 3.9835 LearningRate 0.0005 Epoch: 37 Global Step: 211970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:26,713-Speed 10838.73 samples/sec Loss 3.9776 LearningRate 0.0005 Epoch: 37 Global Step: 211980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:27,640-Speed 11050.97 samples/sec Loss 3.9606 LearningRate 0.0005 Epoch: 37 Global Step: 211990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:28,564-Speed 11096.67 samples/sec Loss 3.9213 LearningRate 0.0005 Epoch: 37 Global Step: 212000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:46:49,781-[lfw][212000]XNorm: 7.616911 Training: 2022-04-11 06:46:49,782-[lfw][212000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-04-11 06:46:49,782-[lfw][212000]Accuracy-Highest: 0.99733 Training: 2022-04-11 06:47:14,362-[cfp_fp][212000]XNorm: 6.520217 Training: 2022-04-11 06:47:14,363-[cfp_fp][212000]Accuracy-Flip: 0.95686+-0.01057 Training: 2022-04-11 06:47:14,363-[cfp_fp][212000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:47:35,476-[agedb_30][212000]XNorm: 7.474929 Training: 2022-04-11 06:47:35,476-[agedb_30][212000]Accuracy-Flip: 0.97183+-0.00717 Training: 2022-04-11 06:47:35,477-[agedb_30][212000]Accuracy-Highest: 0.97183 Training: 2022-04-11 06:47:36,458-Speed 150.82 samples/sec Loss 3.9341 LearningRate 0.0005 Epoch: 37 Global Step: 212010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:37,380-Speed 11121.64 samples/sec Loss 4.1128 LearningRate 0.0005 Epoch: 37 Global Step: 212020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:38,299-Speed 11151.08 samples/sec Loss 3.9486 LearningRate 0.0005 Epoch: 37 Global Step: 212030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:47:39,248-Speed 10788.16 samples/sec Loss 3.9985 LearningRate 0.0005 Epoch: 37 Global Step: 212040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:47:40,172-Speed 11094.17 samples/sec Loss 4.0158 LearningRate 0.0005 Epoch: 37 Global Step: 212050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:47:41,137-Speed 10621.88 samples/sec Loss 3.9981 LearningRate 0.0005 Epoch: 37 Global Step: 212060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:47:42,088-Speed 10774.41 samples/sec Loss 3.9677 LearningRate 0.0005 Epoch: 37 Global Step: 212070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:43,012-Speed 11080.89 samples/sec Loss 4.0150 LearningRate 0.0005 Epoch: 37 Global Step: 212080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:43,982-Speed 10570.92 samples/sec Loss 3.9454 LearningRate 0.0005 Epoch: 37 Global Step: 212090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:44,895-Speed 11229.39 samples/sec Loss 3.9839 LearningRate 0.0005 Epoch: 37 Global Step: 212100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:45,815-Speed 11126.32 samples/sec Loss 3.9788 LearningRate 0.0005 Epoch: 37 Global Step: 212110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:46,725-Speed 11259.60 samples/sec Loss 3.9610 LearningRate 0.0005 Epoch: 37 Global Step: 212120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:47,660-Speed 10968.20 samples/sec Loss 4.0345 LearningRate 0.0005 Epoch: 37 Global Step: 212130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:48,607-Speed 10815.75 samples/sec Loss 4.0196 LearningRate 0.0005 Epoch: 37 Global Step: 212140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:49,516-Speed 11280.78 samples/sec Loss 4.0646 LearningRate 0.0005 Epoch: 37 Global Step: 212150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:50,430-Speed 11210.98 samples/sec Loss 4.0207 LearningRate 0.0005 Epoch: 37 Global Step: 212160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:51,349-Speed 11150.99 samples/sec Loss 4.0293 LearningRate 0.0005 Epoch: 37 Global Step: 212170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:47:52,274-Speed 11074.07 samples/sec Loss 4.0581 LearningRate 0.0005 Epoch: 37 Global Step: 212180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:47:53,203-Speed 11029.87 samples/sec Loss 3.9636 LearningRate 0.0004 Epoch: 37 Global Step: 212190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:54,133-Speed 11019.77 samples/sec Loss 4.0765 LearningRate 0.0004 Epoch: 37 Global Step: 212200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:55,089-Speed 10719.43 samples/sec Loss 3.9731 LearningRate 0.0004 Epoch: 37 Global Step: 212210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:56,003-Speed 11210.00 samples/sec Loss 4.0462 LearningRate 0.0004 Epoch: 37 Global Step: 212220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:56,917-Speed 11203.44 samples/sec Loss 4.0284 LearningRate 0.0004 Epoch: 37 Global Step: 212230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:57,823-Speed 11313.25 samples/sec Loss 4.0420 LearningRate 0.0004 Epoch: 37 Global Step: 212240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:58,759-Speed 10944.13 samples/sec Loss 4.0305 LearningRate 0.0004 Epoch: 37 Global Step: 212250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:47:59,666-Speed 11303.67 samples/sec Loss 3.9879 LearningRate 0.0004 Epoch: 37 Global Step: 212260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:00,608-Speed 10880.28 samples/sec Loss 4.1113 LearningRate 0.0004 Epoch: 37 Global Step: 212270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:01,561-Speed 10749.75 samples/sec Loss 4.1237 LearningRate 0.0004 Epoch: 37 Global Step: 212280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:02,524-Speed 10638.76 samples/sec Loss 4.1297 LearningRate 0.0004 Epoch: 37 Global Step: 212290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:03,477-Speed 10755.98 samples/sec Loss 3.9390 LearningRate 0.0004 Epoch: 37 Global Step: 212300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:04,406-Speed 11028.63 samples/sec Loss 4.0126 LearningRate 0.0004 Epoch: 37 Global Step: 212310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:05,339-Speed 10988.24 samples/sec Loss 4.0421 LearningRate 0.0004 Epoch: 37 Global Step: 212320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:06,278-Speed 10912.37 samples/sec Loss 4.0647 LearningRate 0.0004 Epoch: 37 Global Step: 212330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:07,198-Speed 11135.04 samples/sec Loss 4.1052 LearningRate 0.0004 Epoch: 37 Global Step: 212340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:08,127-Speed 11027.65 samples/sec Loss 4.0642 LearningRate 0.0004 Epoch: 37 Global Step: 212350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:09,046-Speed 11155.81 samples/sec Loss 3.9774 LearningRate 0.0004 Epoch: 37 Global Step: 212360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:09,961-Speed 11199.85 samples/sec Loss 4.0379 LearningRate 0.0004 Epoch: 37 Global Step: 212370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:10,876-Speed 11200.19 samples/sec Loss 3.9717 LearningRate 0.0004 Epoch: 37 Global Step: 212380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:11,783-Speed 11295.14 samples/sec Loss 4.0542 LearningRate 0.0004 Epoch: 37 Global Step: 212390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:48:12,713-Speed 11024.88 samples/sec Loss 4.0501 LearningRate 0.0004 Epoch: 37 Global Step: 212400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:48:13,666-Speed 10745.05 samples/sec Loss 3.8841 LearningRate 0.0004 Epoch: 37 Global Step: 212410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:14,627-Speed 10664.81 samples/sec Loss 4.1601 LearningRate 0.0004 Epoch: 37 Global Step: 212420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:15,546-Speed 11157.67 samples/sec Loss 3.9069 LearningRate 0.0004 Epoch: 37 Global Step: 212430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:16,488-Speed 10872.01 samples/sec Loss 4.0768 LearningRate 0.0004 Epoch: 37 Global Step: 212440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:17,460-Speed 10538.52 samples/sec Loss 4.0425 LearningRate 0.0004 Epoch: 37 Global Step: 212450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:18,413-Speed 10756.16 samples/sec Loss 4.0203 LearningRate 0.0004 Epoch: 37 Global Step: 212460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:19,352-Speed 10911.58 samples/sec Loss 3.9917 LearningRate 0.0004 Epoch: 37 Global Step: 212470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:20,329-Speed 10492.49 samples/sec Loss 4.1079 LearningRate 0.0004 Epoch: 37 Global Step: 212480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:21,229-Speed 11383.23 samples/sec Loss 4.0090 LearningRate 0.0004 Epoch: 37 Global Step: 212490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:22,128-Speed 11393.01 samples/sec Loss 3.9768 LearningRate 0.0004 Epoch: 37 Global Step: 212500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:23,028-Speed 11386.38 samples/sec Loss 3.9931 LearningRate 0.0004 Epoch: 37 Global Step: 212510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:23,986-Speed 10700.08 samples/sec Loss 4.0636 LearningRate 0.0004 Epoch: 37 Global Step: 212520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:24,939-Speed 10755.54 samples/sec Loss 3.9364 LearningRate 0.0004 Epoch: 37 Global Step: 212530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:25,873-Speed 10971.72 samples/sec Loss 4.1146 LearningRate 0.0004 Epoch: 37 Global Step: 212540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:26,779-Speed 11304.70 samples/sec Loss 4.0046 LearningRate 0.0004 Epoch: 37 Global Step: 212550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:27,684-Speed 11325.25 samples/sec Loss 3.8835 LearningRate 0.0004 Epoch: 37 Global Step: 212560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:28,623-Speed 10915.87 samples/sec Loss 4.0648 LearningRate 0.0004 Epoch: 37 Global Step: 212570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:29,567-Speed 10851.02 samples/sec Loss 4.0343 LearningRate 0.0004 Epoch: 37 Global Step: 212580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:30,546-Speed 10471.18 samples/sec Loss 4.0923 LearningRate 0.0004 Epoch: 37 Global Step: 212590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:31,454-Speed 11280.45 samples/sec Loss 4.0687 LearningRate 0.0004 Epoch: 37 Global Step: 212600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:32,361-Speed 11303.60 samples/sec Loss 4.0750 LearningRate 0.0004 Epoch: 37 Global Step: 212610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:33,280-Speed 11145.20 samples/sec Loss 4.0086 LearningRate 0.0004 Epoch: 37 Global Step: 212620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:34,201-Speed 11124.29 samples/sec Loss 3.9302 LearningRate 0.0004 Epoch: 37 Global Step: 212630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:35,126-Speed 11083.92 samples/sec Loss 4.1485 LearningRate 0.0004 Epoch: 37 Global Step: 212640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:36,078-Speed 10762.99 samples/sec Loss 4.0801 LearningRate 0.0004 Epoch: 37 Global Step: 212650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:37,057-Speed 10466.20 samples/sec Loss 4.1082 LearningRate 0.0004 Epoch: 37 Global Step: 212660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:38,050-Speed 10317.42 samples/sec Loss 3.9631 LearningRate 0.0004 Epoch: 37 Global Step: 212670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:39,033-Speed 10426.33 samples/sec Loss 4.0697 LearningRate 0.0004 Epoch: 37 Global Step: 212680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:39,998-Speed 10619.72 samples/sec Loss 4.0575 LearningRate 0.0004 Epoch: 37 Global Step: 212690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:40,919-Speed 11119.21 samples/sec Loss 3.9283 LearningRate 0.0004 Epoch: 37 Global Step: 212700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:41,869-Speed 10793.89 samples/sec Loss 3.9743 LearningRate 0.0004 Epoch: 37 Global Step: 212710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:42,841-Speed 10542.66 samples/sec Loss 4.0603 LearningRate 0.0004 Epoch: 37 Global Step: 212720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:43,778-Speed 10930.52 samples/sec Loss 3.9700 LearningRate 0.0004 Epoch: 37 Global Step: 212730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:44,693-Speed 11200.38 samples/sec Loss 4.0467 LearningRate 0.0004 Epoch: 37 Global Step: 212740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:45,630-Speed 10941.68 samples/sec Loss 4.0220 LearningRate 0.0004 Epoch: 37 Global Step: 212750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:46,548-Speed 11162.41 samples/sec Loss 4.0991 LearningRate 0.0004 Epoch: 37 Global Step: 212760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:47,455-Speed 11291.93 samples/sec Loss 4.0151 LearningRate 0.0004 Epoch: 37 Global Step: 212770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:48,425-Speed 10568.54 samples/sec Loss 4.0590 LearningRate 0.0004 Epoch: 37 Global Step: 212780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:49,361-Speed 10945.65 samples/sec Loss 3.9737 LearningRate 0.0004 Epoch: 37 Global Step: 212790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-11 06:48:50,336-Speed 10515.07 samples/sec Loss 4.0547 LearningRate 0.0004 Epoch: 37 Global Step: 212800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:51,252-Speed 11178.91 samples/sec Loss 4.0243 LearningRate 0.0004 Epoch: 37 Global Step: 212810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:52,216-Speed 10634.66 samples/sec Loss 4.1655 LearningRate 0.0004 Epoch: 37 Global Step: 212820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:53,194-Speed 10481.14 samples/sec Loss 4.0006 LearningRate 0.0004 Epoch: 37 Global Step: 212830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:54,124-Speed 11013.73 samples/sec Loss 4.1325 LearningRate 0.0004 Epoch: 37 Global Step: 212840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:55,102-Speed 10479.25 samples/sec Loss 4.1086 LearningRate 0.0004 Epoch: 37 Global Step: 212850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:56,053-Speed 10771.47 samples/sec Loss 4.0089 LearningRate 0.0004 Epoch: 37 Global Step: 212860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:57,033-Speed 10459.44 samples/sec Loss 3.9168 LearningRate 0.0004 Epoch: 37 Global Step: 212870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:57,949-Speed 11184.20 samples/sec Loss 4.0470 LearningRate 0.0004 Epoch: 37 Global Step: 212880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:58,858-Speed 11271.07 samples/sec Loss 3.9318 LearningRate 0.0004 Epoch: 37 Global Step: 212890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:48:59,824-Speed 10609.00 samples/sec Loss 4.0286 LearningRate 0.0004 Epoch: 37 Global Step: 212900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:00,752-Speed 11038.00 samples/sec Loss 4.0835 LearningRate 0.0004 Epoch: 37 Global Step: 212910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:01,682-Speed 11020.13 samples/sec Loss 4.0944 LearningRate 0.0004 Epoch: 37 Global Step: 212920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:02,578-Speed 11436.19 samples/sec Loss 3.9944 LearningRate 0.0004 Epoch: 37 Global Step: 212930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:03,596-Speed 10062.77 samples/sec Loss 4.0852 LearningRate 0.0004 Epoch: 37 Global Step: 212940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:04,576-Speed 10454.99 samples/sec Loss 4.0444 LearningRate 0.0004 Epoch: 37 Global Step: 212950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:05,499-Speed 11114.51 samples/sec Loss 4.0495 LearningRate 0.0004 Epoch: 37 Global Step: 212960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:06,487-Speed 10368.73 samples/sec Loss 3.9913 LearningRate 0.0004 Epoch: 37 Global Step: 212970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:07,456-Speed 10571.85 samples/sec Loss 4.0464 LearningRate 0.0004 Epoch: 37 Global Step: 212980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:08,400-Speed 10858.45 samples/sec Loss 4.0227 LearningRate 0.0004 Epoch: 37 Global Step: 212990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:09,309-Speed 11274.86 samples/sec Loss 4.0846 LearningRate 0.0004 Epoch: 37 Global Step: 213000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:10,237-Speed 11038.42 samples/sec Loss 4.1397 LearningRate 0.0004 Epoch: 37 Global Step: 213010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:11,176-Speed 10916.17 samples/sec Loss 3.9943 LearningRate 0.0004 Epoch: 37 Global Step: 213020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:12,084-Speed 11285.30 samples/sec Loss 3.9618 LearningRate 0.0004 Epoch: 37 Global Step: 213030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:13,021-Speed 10932.15 samples/sec Loss 3.9938 LearningRate 0.0004 Epoch: 37 Global Step: 213040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:13,935-Speed 11211.97 samples/sec Loss 3.9454 LearningRate 0.0004 Epoch: 37 Global Step: 213050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:14,878-Speed 10870.89 samples/sec Loss 4.1632 LearningRate 0.0004 Epoch: 37 Global Step: 213060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:15,831-Speed 10748.19 samples/sec Loss 4.0215 LearningRate 0.0004 Epoch: 37 Global Step: 213070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:16,753-Speed 11117.96 samples/sec Loss 4.1928 LearningRate 0.0004 Epoch: 37 Global Step: 213080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:17,697-Speed 10858.79 samples/sec Loss 3.9740 LearningRate 0.0004 Epoch: 37 Global Step: 213090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:18,633-Speed 10945.37 samples/sec Loss 4.0437 LearningRate 0.0004 Epoch: 37 Global Step: 213100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:19,596-Speed 10644.71 samples/sec Loss 4.0465 LearningRate 0.0004 Epoch: 37 Global Step: 213110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:20,529-Speed 10983.26 samples/sec Loss 3.9184 LearningRate 0.0004 Epoch: 37 Global Step: 213120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:21,431-Speed 11361.37 samples/sec Loss 3.9871 LearningRate 0.0004 Epoch: 37 Global Step: 213130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:22,337-Speed 11300.81 samples/sec Loss 3.9766 LearningRate 0.0004 Epoch: 37 Global Step: 213140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:23,242-Speed 11327.54 samples/sec Loss 4.1392 LearningRate 0.0004 Epoch: 37 Global Step: 213150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:24,181-Speed 10908.82 samples/sec Loss 4.0037 LearningRate 0.0004 Epoch: 37 Global Step: 213160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:25,078-Speed 11434.95 samples/sec Loss 4.0110 LearningRate 0.0004 Epoch: 37 Global Step: 213170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:26,032-Speed 10737.16 samples/sec Loss 4.0902 LearningRate 0.0004 Epoch: 37 Global Step: 213180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:26,978-Speed 10828.10 samples/sec Loss 4.1043 LearningRate 0.0004 Epoch: 37 Global Step: 213190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:27,946-Speed 10582.26 samples/sec Loss 3.9932 LearningRate 0.0004 Epoch: 37 Global Step: 213200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:28,857-Speed 11253.62 samples/sec Loss 4.1024 LearningRate 0.0004 Epoch: 37 Global Step: 213210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:29,772-Speed 11207.62 samples/sec Loss 3.9989 LearningRate 0.0004 Epoch: 37 Global Step: 213220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:30,708-Speed 10945.07 samples/sec Loss 4.0964 LearningRate 0.0004 Epoch: 37 Global Step: 213230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:31,667-Speed 10687.74 samples/sec Loss 4.0039 LearningRate 0.0004 Epoch: 37 Global Step: 213240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:32,611-Speed 10855.72 samples/sec Loss 3.9927 LearningRate 0.0004 Epoch: 37 Global Step: 213250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:33,512-Speed 11375.39 samples/sec Loss 4.0586 LearningRate 0.0004 Epoch: 37 Global Step: 213260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:34,421-Speed 11271.18 samples/sec Loss 4.0464 LearningRate 0.0004 Epoch: 37 Global Step: 213270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:35,334-Speed 11229.49 samples/sec Loss 4.0501 LearningRate 0.0004 Epoch: 37 Global Step: 213280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:36,249-Speed 11189.71 samples/sec Loss 4.0564 LearningRate 0.0004 Epoch: 37 Global Step: 213290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:37,161-Speed 11245.66 samples/sec Loss 4.1285 LearningRate 0.0004 Epoch: 37 Global Step: 213300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 06:49:38,067-Speed 11302.29 samples/sec Loss 3.9967 LearningRate 0.0004 Epoch: 37 Global Step: 213310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:38,984-Speed 11177.40 samples/sec Loss 4.0598 LearningRate 0.0004 Epoch: 37 Global Step: 213320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:39,943-Speed 10685.01 samples/sec Loss 4.0883 LearningRate 0.0004 Epoch: 37 Global Step: 213330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:40,869-Speed 11064.46 samples/sec Loss 4.1000 LearningRate 0.0004 Epoch: 37 Global Step: 213340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:41,853-Speed 10407.56 samples/sec Loss 4.0926 LearningRate 0.0004 Epoch: 37 Global Step: 213350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 06:49:42,829-Speed 10501.59 samples/sec Loss 4.0921 LearningRate 0.0004 Epoch: 37 Global Step: 213360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:49:43,770-Speed 10886.21 samples/sec Loss 4.1136 LearningRate 0.0004 Epoch: 37 Global Step: 213370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:49:44,699-Speed 11037.97 samples/sec Loss 4.0809 LearningRate 0.0004 Epoch: 37 Global Step: 213380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:49:45,599-Speed 11379.30 samples/sec Loss 4.0778 LearningRate 0.0004 Epoch: 37 Global Step: 213390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:46,497-Speed 11421.81 samples/sec Loss 4.0867 LearningRate 0.0004 Epoch: 37 Global Step: 213400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:47,437-Speed 10903.46 samples/sec Loss 4.1667 LearningRate 0.0004 Epoch: 37 Global Step: 213410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:48,377-Speed 10896.10 samples/sec Loss 4.0684 LearningRate 0.0004 Epoch: 37 Global Step: 213420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:49,295-Speed 11157.07 samples/sec Loss 4.1044 LearningRate 0.0004 Epoch: 37 Global Step: 213430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:50,227-Speed 10999.98 samples/sec Loss 4.0956 LearningRate 0.0004 Epoch: 37 Global Step: 213440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:51,178-Speed 10777.94 samples/sec Loss 4.0600 LearningRate 0.0004 Epoch: 37 Global Step: 213450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:52,100-Speed 11108.03 samples/sec Loss 3.9008 LearningRate 0.0004 Epoch: 37 Global Step: 213460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:53,054-Speed 10742.33 samples/sec Loss 4.1400 LearningRate 0.0004 Epoch: 37 Global Step: 213470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:54,013-Speed 10689.41 samples/sec Loss 3.9647 LearningRate 0.0004 Epoch: 37 Global Step: 213480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:54,926-Speed 11217.76 samples/sec Loss 4.0032 LearningRate 0.0004 Epoch: 37 Global Step: 213490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:49:55,872-Speed 10832.22 samples/sec Loss 4.0534 LearningRate 0.0004 Epoch: 37 Global Step: 213500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:49:56,781-Speed 11269.61 samples/sec Loss 4.0612 LearningRate 0.0004 Epoch: 37 Global Step: 213510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:49:57,699-Speed 11169.19 samples/sec Loss 4.0102 LearningRate 0.0004 Epoch: 37 Global Step: 213520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:58,610-Speed 11242.49 samples/sec Loss 4.0743 LearningRate 0.0004 Epoch: 37 Global Step: 213530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:49:59,528-Speed 11170.14 samples/sec Loss 4.0538 LearningRate 0.0004 Epoch: 37 Global Step: 213540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:00,454-Speed 11062.50 samples/sec Loss 4.0503 LearningRate 0.0004 Epoch: 37 Global Step: 213550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:01,424-Speed 10558.95 samples/sec Loss 4.0345 LearningRate 0.0004 Epoch: 37 Global Step: 213560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:02,379-Speed 10735.60 samples/sec Loss 4.0929 LearningRate 0.0004 Epoch: 37 Global Step: 213570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:03,322-Speed 10865.16 samples/sec Loss 4.0533 LearningRate 0.0004 Epoch: 37 Global Step: 213580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:04,228-Speed 11316.43 samples/sec Loss 4.0870 LearningRate 0.0004 Epoch: 37 Global Step: 213590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:05,159-Speed 11009.81 samples/sec Loss 4.0003 LearningRate 0.0004 Epoch: 37 Global Step: 213600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:06,103-Speed 10851.61 samples/sec Loss 4.0471 LearningRate 0.0004 Epoch: 37 Global Step: 213610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:07,063-Speed 10678.04 samples/sec Loss 4.0249 LearningRate 0.0004 Epoch: 37 Global Step: 213620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:08,009-Speed 10828.88 samples/sec Loss 3.9808 LearningRate 0.0004 Epoch: 37 Global Step: 213630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:08,930-Speed 11123.54 samples/sec Loss 3.9261 LearningRate 0.0004 Epoch: 37 Global Step: 213640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:09,830-Speed 11388.77 samples/sec Loss 4.0143 LearningRate 0.0004 Epoch: 37 Global Step: 213650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:10,742-Speed 11233.27 samples/sec Loss 4.0050 LearningRate 0.0004 Epoch: 37 Global Step: 213660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:11,689-Speed 10827.31 samples/sec Loss 4.0800 LearningRate 0.0004 Epoch: 37 Global Step: 213670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:12,675-Speed 10392.87 samples/sec Loss 4.1399 LearningRate 0.0004 Epoch: 37 Global Step: 213680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:13,647-Speed 10535.63 samples/sec Loss 4.1042 LearningRate 0.0004 Epoch: 37 Global Step: 213690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:14,601-Speed 10736.42 samples/sec Loss 4.0370 LearningRate 0.0004 Epoch: 37 Global Step: 213700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:15,499-Speed 11414.21 samples/sec Loss 3.9621 LearningRate 0.0004 Epoch: 37 Global Step: 213710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:16,420-Speed 11125.47 samples/sec Loss 4.0654 LearningRate 0.0004 Epoch: 37 Global Step: 213720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:17,335-Speed 11200.70 samples/sec Loss 3.9657 LearningRate 0.0004 Epoch: 37 Global Step: 213730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:50:18,291-Speed 10717.40 samples/sec Loss 4.0506 LearningRate 0.0004 Epoch: 37 Global Step: 213740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:19,234-Speed 10873.50 samples/sec Loss 4.1410 LearningRate 0.0004 Epoch: 37 Global Step: 213750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:20,167-Speed 10976.35 samples/sec Loss 4.0826 LearningRate 0.0004 Epoch: 37 Global Step: 213760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:21,077-Speed 11261.19 samples/sec Loss 4.0134 LearningRate 0.0004 Epoch: 37 Global Step: 213770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:21,994-Speed 11184.61 samples/sec Loss 4.1408 LearningRate 0.0004 Epoch: 37 Global Step: 213780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:22,907-Speed 11222.60 samples/sec Loss 4.0875 LearningRate 0.0004 Epoch: 37 Global Step: 213790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:23,833-Speed 11060.17 samples/sec Loss 4.0230 LearningRate 0.0004 Epoch: 37 Global Step: 213800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:24,771-Speed 10926.60 samples/sec Loss 3.9956 LearningRate 0.0004 Epoch: 37 Global Step: 213810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:25,713-Speed 10881.27 samples/sec Loss 3.9321 LearningRate 0.0004 Epoch: 37 Global Step: 213820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:26,622-Speed 11271.55 samples/sec Loss 4.0010 LearningRate 0.0004 Epoch: 37 Global Step: 213830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:27,576-Speed 10750.27 samples/sec Loss 3.9369 LearningRate 0.0004 Epoch: 37 Global Step: 213840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:50:28,566-Speed 10347.04 samples/sec Loss 4.0815 LearningRate 0.0004 Epoch: 37 Global Step: 213850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:50:29,508-Speed 10876.40 samples/sec Loss 4.0607 LearningRate 0.0004 Epoch: 37 Global Step: 213860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:50:30,452-Speed 10853.19 samples/sec Loss 4.0169 LearningRate 0.0004 Epoch: 37 Global Step: 213870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:50:31,401-Speed 10794.68 samples/sec Loss 3.9773 LearningRate 0.0004 Epoch: 37 Global Step: 213880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:50:32,355-Speed 10740.97 samples/sec Loss 3.8567 LearningRate 0.0004 Epoch: 37 Global Step: 213890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:50:33,261-Speed 11318.10 samples/sec Loss 4.1992 LearningRate 0.0004 Epoch: 37 Global Step: 213900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:34,173-Speed 11232.61 samples/sec Loss 4.0360 LearningRate 0.0004 Epoch: 37 Global Step: 213910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:35,110-Speed 10936.92 samples/sec Loss 4.0597 LearningRate 0.0004 Epoch: 37 Global Step: 213920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:36,024-Speed 11216.39 samples/sec Loss 4.0104 LearningRate 0.0004 Epoch: 37 Global Step: 213930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:36,986-Speed 10646.63 samples/sec Loss 4.0347 LearningRate 0.0004 Epoch: 37 Global Step: 213940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:37,966-Speed 10450.39 samples/sec Loss 4.1053 LearningRate 0.0004 Epoch: 37 Global Step: 213950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:38,909-Speed 10870.09 samples/sec Loss 3.9677 LearningRate 0.0004 Epoch: 37 Global Step: 213960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:39,857-Speed 10820.87 samples/sec Loss 4.1959 LearningRate 0.0004 Epoch: 37 Global Step: 213970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:40,774-Speed 11166.64 samples/sec Loss 4.0924 LearningRate 0.0004 Epoch: 37 Global Step: 213980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:41,764-Speed 10354.80 samples/sec Loss 4.0939 LearningRate 0.0003 Epoch: 37 Global Step: 213990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:50:42,747-Speed 10419.49 samples/sec Loss 3.9928 LearningRate 0.0003 Epoch: 37 Global Step: 214000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:04,093-[lfw][214000]XNorm: 7.606675 Training: 2022-04-11 06:51:04,093-[lfw][214000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-11 06:51:04,094-[lfw][214000]Accuracy-Highest: 0.99750 Training: 2022-04-11 06:51:28,736-[cfp_fp][214000]XNorm: 6.514673 Training: 2022-04-11 06:51:28,736-[cfp_fp][214000]Accuracy-Flip: 0.95471+-0.01254 Training: 2022-04-11 06:51:28,737-[cfp_fp][214000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:51:49,985-[agedb_30][214000]XNorm: 7.462592 Training: 2022-04-11 06:51:49,985-[agedb_30][214000]Accuracy-Flip: 0.97067+-0.00700 Training: 2022-04-11 06:51:49,986-[agedb_30][214000]Accuracy-Highest: 0.97183 Training: 2022-04-11 06:51:50,935-Speed 150.18 samples/sec Loss 4.0349 LearningRate 0.0003 Epoch: 37 Global Step: 214010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:51,929-Speed 10309.06 samples/sec Loss 4.0533 LearningRate 0.0003 Epoch: 37 Global Step: 214020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:52,956-Speed 9975.24 samples/sec Loss 4.0528 LearningRate 0.0003 Epoch: 37 Global Step: 214030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:53,913-Speed 10702.38 samples/sec Loss 4.1057 LearningRate 0.0003 Epoch: 37 Global Step: 214040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:54,843-Speed 11017.45 samples/sec Loss 4.0847 LearningRate 0.0003 Epoch: 37 Global Step: 214050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:55,798-Speed 10734.32 samples/sec Loss 4.0870 LearningRate 0.0003 Epoch: 37 Global Step: 214060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:56,704-Speed 11304.18 samples/sec Loss 4.0370 LearningRate 0.0003 Epoch: 37 Global Step: 214070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:51:57,609-Speed 11329.88 samples/sec Loss 3.9571 LearningRate 0.0003 Epoch: 37 Global Step: 214080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:51:58,521-Speed 11231.96 samples/sec Loss 4.0019 LearningRate 0.0003 Epoch: 37 Global Step: 214090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:51:59,440-Speed 11147.24 samples/sec Loss 4.0704 LearningRate 0.0003 Epoch: 37 Global Step: 214100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:00,408-Speed 10591.18 samples/sec Loss 3.9619 LearningRate 0.0003 Epoch: 37 Global Step: 214110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:01,361-Speed 10749.96 samples/sec Loss 3.9589 LearningRate 0.0003 Epoch: 37 Global Step: 214120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:02,298-Speed 10934.04 samples/sec Loss 3.9890 LearningRate 0.0003 Epoch: 37 Global Step: 214130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:03,209-Speed 11251.49 samples/sec Loss 4.0544 LearningRate 0.0003 Epoch: 37 Global Step: 214140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:04,143-Speed 10971.73 samples/sec Loss 4.0188 LearningRate 0.0003 Epoch: 37 Global Step: 214150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:05,128-Speed 10403.99 samples/sec Loss 4.0821 LearningRate 0.0003 Epoch: 37 Global Step: 214160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:06,060-Speed 10986.68 samples/sec Loss 4.1293 LearningRate 0.0003 Epoch: 37 Global Step: 214170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:06,994-Speed 10973.63 samples/sec Loss 4.0782 LearningRate 0.0003 Epoch: 37 Global Step: 214180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:07,993-Speed 10263.07 samples/sec Loss 3.9657 LearningRate 0.0003 Epoch: 37 Global Step: 214190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:08,932-Speed 10909.90 samples/sec Loss 4.0696 LearningRate 0.0003 Epoch: 37 Global Step: 214200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:09,842-Speed 11261.14 samples/sec Loss 4.0375 LearningRate 0.0003 Epoch: 37 Global Step: 214210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:10,790-Speed 10811.93 samples/sec Loss 4.0864 LearningRate 0.0003 Epoch: 37 Global Step: 214220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:11,722-Speed 10985.30 samples/sec Loss 4.0803 LearningRate 0.0003 Epoch: 37 Global Step: 214230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:12,644-Speed 11120.24 samples/sec Loss 3.9744 LearningRate 0.0003 Epoch: 37 Global Step: 214240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:13,578-Speed 10968.44 samples/sec Loss 4.0239 LearningRate 0.0003 Epoch: 37 Global Step: 214250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:14,530-Speed 10760.38 samples/sec Loss 4.1166 LearningRate 0.0003 Epoch: 37 Global Step: 214260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:15,470-Speed 10905.33 samples/sec Loss 4.0832 LearningRate 0.0003 Epoch: 37 Global Step: 214270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:16,475-Speed 10197.10 samples/sec Loss 4.0429 LearningRate 0.0003 Epoch: 37 Global Step: 214280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:17,481-Speed 10180.88 samples/sec Loss 4.0653 LearningRate 0.0003 Epoch: 37 Global Step: 214290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:18,485-Speed 10202.23 samples/sec Loss 4.0762 LearningRate 0.0003 Epoch: 37 Global Step: 214300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:19,468-Speed 10425.67 samples/sec Loss 4.0701 LearningRate 0.0003 Epoch: 37 Global Step: 214310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:20,381-Speed 11222.65 samples/sec Loss 4.0884 LearningRate 0.0003 Epoch: 37 Global Step: 214320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:21,303-Speed 11117.82 samples/sec Loss 4.0577 LearningRate 0.0003 Epoch: 37 Global Step: 214330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:22,235-Speed 10993.47 samples/sec Loss 4.1951 LearningRate 0.0003 Epoch: 37 Global Step: 214340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:23,154-Speed 11154.44 samples/sec Loss 4.1015 LearningRate 0.0003 Epoch: 37 Global Step: 214350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:24,127-Speed 10540.43 samples/sec Loss 4.1609 LearningRate 0.0003 Epoch: 37 Global Step: 214360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:25,051-Speed 11085.40 samples/sec Loss 3.9400 LearningRate 0.0003 Epoch: 37 Global Step: 214370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:25,954-Speed 11342.71 samples/sec Loss 4.0650 LearningRate 0.0003 Epoch: 37 Global Step: 214380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:26,872-Speed 11164.54 samples/sec Loss 4.0725 LearningRate 0.0003 Epoch: 37 Global Step: 214390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:27,806-Speed 10972.46 samples/sec Loss 4.1000 LearningRate 0.0003 Epoch: 37 Global Step: 214400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:28,759-Speed 10749.93 samples/sec Loss 4.0787 LearningRate 0.0003 Epoch: 37 Global Step: 214410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:29,750-Speed 10337.49 samples/sec Loss 4.1038 LearningRate 0.0003 Epoch: 37 Global Step: 214420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:30,727-Speed 10489.09 samples/sec Loss 3.9780 LearningRate 0.0003 Epoch: 37 Global Step: 214430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:31,704-Speed 10489.48 samples/sec Loss 4.1114 LearningRate 0.0003 Epoch: 37 Global Step: 214440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:32,620-Speed 11181.71 samples/sec Loss 4.1248 LearningRate 0.0003 Epoch: 37 Global Step: 214450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:33,520-Speed 11388.58 samples/sec Loss 4.1136 LearningRate 0.0003 Epoch: 37 Global Step: 214460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:34,428-Speed 11286.12 samples/sec Loss 4.0790 LearningRate 0.0003 Epoch: 37 Global Step: 214470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:35,351-Speed 11110.00 samples/sec Loss 4.0760 LearningRate 0.0003 Epoch: 37 Global Step: 214480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:36,260-Speed 11271.73 samples/sec Loss 4.0509 LearningRate 0.0003 Epoch: 37 Global Step: 214490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:37,182-Speed 11106.18 samples/sec Loss 4.0378 LearningRate 0.0003 Epoch: 37 Global Step: 214500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:38,131-Speed 10798.04 samples/sec Loss 4.0791 LearningRate 0.0003 Epoch: 37 Global Step: 214510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:39,039-Speed 11285.83 samples/sec Loss 4.1052 LearningRate 0.0003 Epoch: 37 Global Step: 214520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:39,957-Speed 11167.13 samples/sec Loss 4.0835 LearningRate 0.0003 Epoch: 37 Global Step: 214530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:40,913-Speed 10721.02 samples/sec Loss 4.0498 LearningRate 0.0003 Epoch: 37 Global Step: 214540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:41,844-Speed 11000.29 samples/sec Loss 4.2036 LearningRate 0.0003 Epoch: 37 Global Step: 214550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:42,790-Speed 10841.78 samples/sec Loss 4.0872 LearningRate 0.0003 Epoch: 37 Global Step: 214560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:43,738-Speed 10809.56 samples/sec Loss 4.0573 LearningRate 0.0003 Epoch: 37 Global Step: 214570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:44,647-Speed 11271.51 samples/sec Loss 4.0304 LearningRate 0.0003 Epoch: 37 Global Step: 214580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:52:45,547-Speed 11391.97 samples/sec Loss 3.9809 LearningRate 0.0003 Epoch: 37 Global Step: 214590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:46,452-Speed 11317.62 samples/sec Loss 4.0444 LearningRate 0.0003 Epoch: 37 Global Step: 214600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:47,390-Speed 10922.47 samples/sec Loss 4.0614 LearningRate 0.0003 Epoch: 37 Global Step: 214610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:48,316-Speed 11078.36 samples/sec Loss 3.9938 LearningRate 0.0003 Epoch: 37 Global Step: 214620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:49,250-Speed 10972.53 samples/sec Loss 4.0889 LearningRate 0.0003 Epoch: 37 Global Step: 214630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:50,151-Speed 11369.67 samples/sec Loss 4.0827 LearningRate 0.0003 Epoch: 37 Global Step: 214640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:51,068-Speed 11180.89 samples/sec Loss 3.9885 LearningRate 0.0003 Epoch: 37 Global Step: 214650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:51,971-Speed 11336.93 samples/sec Loss 4.0612 LearningRate 0.0003 Epoch: 37 Global Step: 214660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:52,887-Speed 11190.12 samples/sec Loss 4.0765 LearningRate 0.0003 Epoch: 37 Global Step: 214670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:53,828-Speed 10895.55 samples/sec Loss 4.1568 LearningRate 0.0003 Epoch: 37 Global Step: 214680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:54,741-Speed 11222.65 samples/sec Loss 3.9940 LearningRate 0.0003 Epoch: 37 Global Step: 214690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:52:55,660-Speed 11150.73 samples/sec Loss 4.0222 LearningRate 0.0003 Epoch: 37 Global Step: 214700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:52:56,592-Speed 10988.72 samples/sec Loss 4.0687 LearningRate 0.0003 Epoch: 37 Global Step: 214710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:57,544-Speed 10762.24 samples/sec Loss 4.0750 LearningRate 0.0003 Epoch: 37 Global Step: 214720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:58,498-Speed 10746.61 samples/sec Loss 3.9895 LearningRate 0.0003 Epoch: 37 Global Step: 214730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:52:59,486-Speed 10371.77 samples/sec Loss 4.1220 LearningRate 0.0003 Epoch: 37 Global Step: 214740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:00,395-Speed 11279.62 samples/sec Loss 4.0674 LearningRate 0.0003 Epoch: 37 Global Step: 214750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:01,342-Speed 10818.69 samples/sec Loss 4.0167 LearningRate 0.0003 Epoch: 37 Global Step: 214760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:02,248-Speed 11311.55 samples/sec Loss 4.1464 LearningRate 0.0003 Epoch: 37 Global Step: 214770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:03,173-Speed 11069.39 samples/sec Loss 3.9351 LearningRate 0.0003 Epoch: 37 Global Step: 214780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:04,114-Speed 10900.78 samples/sec Loss 4.0469 LearningRate 0.0003 Epoch: 37 Global Step: 214790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:05,075-Speed 10653.79 samples/sec Loss 4.1141 LearningRate 0.0003 Epoch: 37 Global Step: 214800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:06,039-Speed 10637.96 samples/sec Loss 4.0829 LearningRate 0.0003 Epoch: 37 Global Step: 214810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:07,009-Speed 10563.62 samples/sec Loss 4.0871 LearningRate 0.0003 Epoch: 37 Global Step: 214820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:07,910-Speed 11367.21 samples/sec Loss 4.0461 LearningRate 0.0003 Epoch: 37 Global Step: 214830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:08,861-Speed 10776.11 samples/sec Loss 3.9474 LearningRate 0.0003 Epoch: 37 Global Step: 214840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:09,778-Speed 11170.77 samples/sec Loss 4.1971 LearningRate 0.0003 Epoch: 37 Global Step: 214850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:10,729-Speed 10773.63 samples/sec Loss 4.0149 LearningRate 0.0003 Epoch: 37 Global Step: 214860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:11,700-Speed 10563.05 samples/sec Loss 4.0576 LearningRate 0.0003 Epoch: 37 Global Step: 214870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:12,619-Speed 11145.02 samples/sec Loss 4.0815 LearningRate 0.0003 Epoch: 37 Global Step: 214880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:13,550-Speed 11003.61 samples/sec Loss 4.0597 LearningRate 0.0003 Epoch: 37 Global Step: 214890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:14,515-Speed 10625.08 samples/sec Loss 4.0526 LearningRate 0.0003 Epoch: 37 Global Step: 214900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:15,431-Speed 11188.47 samples/sec Loss 4.0029 LearningRate 0.0003 Epoch: 37 Global Step: 214910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:16,349-Speed 11158.55 samples/sec Loss 4.0113 LearningRate 0.0003 Epoch: 37 Global Step: 214920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:17,290-Speed 10897.43 samples/sec Loss 4.0773 LearningRate 0.0003 Epoch: 37 Global Step: 214930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:18,194-Speed 11331.23 samples/sec Loss 4.0530 LearningRate 0.0003 Epoch: 37 Global Step: 214940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:19,157-Speed 10653.18 samples/sec Loss 3.9713 LearningRate 0.0003 Epoch: 37 Global Step: 214950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:20,138-Speed 10441.84 samples/sec Loss 4.0722 LearningRate 0.0003 Epoch: 37 Global Step: 214960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:21,055-Speed 11178.68 samples/sec Loss 4.0140 LearningRate 0.0003 Epoch: 37 Global Step: 214970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:21,956-Speed 11363.03 samples/sec Loss 4.0847 LearningRate 0.0003 Epoch: 37 Global Step: 214980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:22,891-Speed 10964.21 samples/sec Loss 4.0931 LearningRate 0.0003 Epoch: 37 Global Step: 214990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:23,830-Speed 10909.81 samples/sec Loss 4.0224 LearningRate 0.0003 Epoch: 37 Global Step: 215000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:24,815-Speed 10402.01 samples/sec Loss 3.9860 LearningRate 0.0003 Epoch: 37 Global Step: 215010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:25,771-Speed 10725.05 samples/sec Loss 4.0840 LearningRate 0.0003 Epoch: 37 Global Step: 215020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:26,760-Speed 10355.63 samples/sec Loss 4.1248 LearningRate 0.0003 Epoch: 37 Global Step: 215030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:27,681-Speed 11126.04 samples/sec Loss 4.0369 LearningRate 0.0003 Epoch: 37 Global Step: 215040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:28,611-Speed 11023.51 samples/sec Loss 4.0074 LearningRate 0.0003 Epoch: 37 Global Step: 215050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:29,522-Speed 11244.49 samples/sec Loss 4.0092 LearningRate 0.0003 Epoch: 37 Global Step: 215060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:30,425-Speed 11348.40 samples/sec Loss 4.0369 LearningRate 0.0003 Epoch: 37 Global Step: 215070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:31,362-Speed 10937.10 samples/sec Loss 4.0200 LearningRate 0.0003 Epoch: 37 Global Step: 215080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:32,287-Speed 11070.27 samples/sec Loss 4.1224 LearningRate 0.0003 Epoch: 37 Global Step: 215090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:33,192-Speed 11329.57 samples/sec Loss 3.9394 LearningRate 0.0003 Epoch: 37 Global Step: 215100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:34,109-Speed 11175.43 samples/sec Loss 3.9935 LearningRate 0.0003 Epoch: 37 Global Step: 215110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:35,025-Speed 11186.57 samples/sec Loss 4.1410 LearningRate 0.0003 Epoch: 37 Global Step: 215120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:36,009-Speed 10412.03 samples/sec Loss 4.0906 LearningRate 0.0003 Epoch: 37 Global Step: 215130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:36,931-Speed 11115.63 samples/sec Loss 3.9462 LearningRate 0.0003 Epoch: 37 Global Step: 215140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:37,868-Speed 10935.28 samples/sec Loss 4.0977 LearningRate 0.0003 Epoch: 37 Global Step: 215150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:38,814-Speed 10830.56 samples/sec Loss 4.0286 LearningRate 0.0003 Epoch: 37 Global Step: 215160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:39,742-Speed 11048.22 samples/sec Loss 4.0067 LearningRate 0.0003 Epoch: 37 Global Step: 215170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:40,663-Speed 11122.14 samples/sec Loss 3.9970 LearningRate 0.0003 Epoch: 37 Global Step: 215180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:41,626-Speed 10640.67 samples/sec Loss 3.9929 LearningRate 0.0003 Epoch: 37 Global Step: 215190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:42,547-Speed 11131.56 samples/sec Loss 4.0309 LearningRate 0.0003 Epoch: 37 Global Step: 215200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:43,529-Speed 10429.51 samples/sec Loss 4.0565 LearningRate 0.0003 Epoch: 37 Global Step: 215210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:44,518-Speed 10369.11 samples/sec Loss 3.9487 LearningRate 0.0003 Epoch: 37 Global Step: 215220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:45,419-Speed 11368.65 samples/sec Loss 4.0270 LearningRate 0.0003 Epoch: 37 Global Step: 215230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:46,329-Speed 11253.35 samples/sec Loss 4.2326 LearningRate 0.0003 Epoch: 37 Global Step: 215240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:47,240-Speed 11248.56 samples/sec Loss 4.0307 LearningRate 0.0003 Epoch: 37 Global Step: 215250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:48,235-Speed 10300.33 samples/sec Loss 4.1043 LearningRate 0.0003 Epoch: 37 Global Step: 215260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:49,147-Speed 11233.77 samples/sec Loss 4.0288 LearningRate 0.0003 Epoch: 37 Global Step: 215270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:50,088-Speed 10889.29 samples/sec Loss 4.0170 LearningRate 0.0003 Epoch: 37 Global Step: 215280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:53:51,043-Speed 10735.97 samples/sec Loss 3.9876 LearningRate 0.0003 Epoch: 37 Global Step: 215290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:52,015-Speed 10541.20 samples/sec Loss 4.0619 LearningRate 0.0003 Epoch: 37 Global Step: 215300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:52,942-Speed 11048.86 samples/sec Loss 3.9913 LearningRate 0.0003 Epoch: 37 Global Step: 215310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:53,889-Speed 10820.26 samples/sec Loss 3.9368 LearningRate 0.0003 Epoch: 37 Global Step: 215320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:54,828-Speed 10916.77 samples/sec Loss 4.0518 LearningRate 0.0003 Epoch: 37 Global Step: 215330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:55,792-Speed 10633.30 samples/sec Loss 4.1100 LearningRate 0.0003 Epoch: 37 Global Step: 215340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:56,740-Speed 10806.27 samples/sec Loss 4.0653 LearningRate 0.0003 Epoch: 37 Global Step: 215350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:57,652-Speed 11235.05 samples/sec Loss 4.0881 LearningRate 0.0003 Epoch: 37 Global Step: 215360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:58,601-Speed 10802.36 samples/sec Loss 4.0516 LearningRate 0.0003 Epoch: 37 Global Step: 215370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:53:59,573-Speed 10538.79 samples/sec Loss 3.9420 LearningRate 0.0003 Epoch: 37 Global Step: 215380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:00,509-Speed 10952.87 samples/sec Loss 4.1358 LearningRate 0.0003 Epoch: 37 Global Step: 215390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:01,436-Speed 11048.32 samples/sec Loss 4.1653 LearningRate 0.0003 Epoch: 37 Global Step: 215400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:02,356-Speed 11140.50 samples/sec Loss 4.1478 LearningRate 0.0003 Epoch: 37 Global Step: 215410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:03,277-Speed 11140.27 samples/sec Loss 3.9784 LearningRate 0.0003 Epoch: 37 Global Step: 215420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:04,202-Speed 11072.87 samples/sec Loss 4.0243 LearningRate 0.0003 Epoch: 37 Global Step: 215430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:05,137-Speed 10966.30 samples/sec Loss 4.0319 LearningRate 0.0003 Epoch: 37 Global Step: 215440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:06,072-Speed 10956.02 samples/sec Loss 4.0911 LearningRate 0.0003 Epoch: 37 Global Step: 215450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:06,993-Speed 11124.35 samples/sec Loss 4.1098 LearningRate 0.0003 Epoch: 37 Global Step: 215460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:07,946-Speed 10758.21 samples/sec Loss 4.1078 LearningRate 0.0003 Epoch: 37 Global Step: 215470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:08,840-Speed 11460.20 samples/sec Loss 3.9664 LearningRate 0.0003 Epoch: 37 Global Step: 215480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:09,750-Speed 11255.79 samples/sec Loss 4.0745 LearningRate 0.0003 Epoch: 37 Global Step: 215490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:10,669-Speed 11151.06 samples/sec Loss 4.0742 LearningRate 0.0003 Epoch: 37 Global Step: 215500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:11,597-Speed 11038.78 samples/sec Loss 4.1606 LearningRate 0.0003 Epoch: 37 Global Step: 215510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:12,506-Speed 11271.91 samples/sec Loss 4.0879 LearningRate 0.0003 Epoch: 37 Global Step: 215520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:13,478-Speed 10550.40 samples/sec Loss 3.9764 LearningRate 0.0003 Epoch: 37 Global Step: 215530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:14,413-Speed 10951.44 samples/sec Loss 4.1389 LearningRate 0.0003 Epoch: 37 Global Step: 215540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:15,359-Speed 10834.31 samples/sec Loss 4.1383 LearningRate 0.0003 Epoch: 37 Global Step: 215550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:16,323-Speed 10630.38 samples/sec Loss 4.1450 LearningRate 0.0003 Epoch: 37 Global Step: 215560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:17,267-Speed 10860.74 samples/sec Loss 4.0778 LearningRate 0.0003 Epoch: 37 Global Step: 215570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:18,207-Speed 10896.90 samples/sec Loss 4.0548 LearningRate 0.0003 Epoch: 37 Global Step: 215580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:19,157-Speed 10788.61 samples/sec Loss 4.0093 LearningRate 0.0003 Epoch: 37 Global Step: 215590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:20,079-Speed 11112.57 samples/sec Loss 3.9566 LearningRate 0.0003 Epoch: 37 Global Step: 215600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:21,005-Speed 11063.59 samples/sec Loss 3.9253 LearningRate 0.0003 Epoch: 37 Global Step: 215610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:21,919-Speed 11217.13 samples/sec Loss 4.2107 LearningRate 0.0003 Epoch: 37 Global Step: 215620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:22,820-Speed 11373.40 samples/sec Loss 3.9421 LearningRate 0.0003 Epoch: 37 Global Step: 215630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:23,742-Speed 11115.26 samples/sec Loss 4.0463 LearningRate 0.0003 Epoch: 37 Global Step: 215640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:24,661-Speed 11147.79 samples/sec Loss 3.9367 LearningRate 0.0003 Epoch: 37 Global Step: 215650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:25,594-Speed 10981.29 samples/sec Loss 4.1070 LearningRate 0.0003 Epoch: 37 Global Step: 215660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:26,576-Speed 10438.13 samples/sec Loss 4.0532 LearningRate 0.0003 Epoch: 37 Global Step: 215670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:27,504-Speed 11046.79 samples/sec Loss 4.0370 LearningRate 0.0003 Epoch: 37 Global Step: 215680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:28,476-Speed 10537.08 samples/sec Loss 4.0650 LearningRate 0.0003 Epoch: 37 Global Step: 215690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:29,418-Speed 10876.15 samples/sec Loss 4.0614 LearningRate 0.0003 Epoch: 37 Global Step: 215700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:30,368-Speed 10789.43 samples/sec Loss 4.0612 LearningRate 0.0003 Epoch: 37 Global Step: 215710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:31,345-Speed 10486.13 samples/sec Loss 4.0480 LearningRate 0.0003 Epoch: 37 Global Step: 215720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:32,272-Speed 11060.54 samples/sec Loss 4.1661 LearningRate 0.0003 Epoch: 37 Global Step: 215730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:33,180-Speed 11283.52 samples/sec Loss 4.0604 LearningRate 0.0003 Epoch: 37 Global Step: 215740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:34,089-Speed 11272.26 samples/sec Loss 4.1011 LearningRate 0.0003 Epoch: 37 Global Step: 215750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:35,021-Speed 10998.78 samples/sec Loss 3.8636 LearningRate 0.0003 Epoch: 37 Global Step: 215760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:35,944-Speed 11093.11 samples/sec Loss 4.2714 LearningRate 0.0003 Epoch: 37 Global Step: 215770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:36,886-Speed 10884.30 samples/sec Loss 4.0545 LearningRate 0.0003 Epoch: 37 Global Step: 215780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:37,847-Speed 10664.61 samples/sec Loss 4.0989 LearningRate 0.0003 Epoch: 37 Global Step: 215790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:38,831-Speed 10409.32 samples/sec Loss 3.9848 LearningRate 0.0003 Epoch: 37 Global Step: 215800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:54:39,857-Speed 9992.47 samples/sec Loss 4.0710 LearningRate 0.0003 Epoch: 37 Global Step: 215810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:40,844-Speed 10385.18 samples/sec Loss 4.1355 LearningRate 0.0003 Epoch: 37 Global Step: 215820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:41,770-Speed 11055.93 samples/sec Loss 4.1033 LearningRate 0.0003 Epoch: 37 Global Step: 215830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:42,712-Speed 10878.11 samples/sec Loss 4.0032 LearningRate 0.0003 Epoch: 37 Global Step: 215840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:43,648-Speed 10954.77 samples/sec Loss 3.9543 LearningRate 0.0003 Epoch: 37 Global Step: 215850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:44,577-Speed 11035.12 samples/sec Loss 4.1196 LearningRate 0.0003 Epoch: 37 Global Step: 215860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:45,490-Speed 11224.95 samples/sec Loss 4.1329 LearningRate 0.0003 Epoch: 37 Global Step: 215870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:46,396-Speed 11311.03 samples/sec Loss 4.1384 LearningRate 0.0003 Epoch: 37 Global Step: 215880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:47,296-Speed 11379.24 samples/sec Loss 4.0413 LearningRate 0.0003 Epoch: 37 Global Step: 215890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:48,237-Speed 10895.45 samples/sec Loss 4.0598 LearningRate 0.0003 Epoch: 37 Global Step: 215900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:49,153-Speed 11188.85 samples/sec Loss 3.9447 LearningRate 0.0003 Epoch: 37 Global Step: 215910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:50,109-Speed 10711.97 samples/sec Loss 4.0129 LearningRate 0.0003 Epoch: 37 Global Step: 215920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:51,027-Speed 11161.41 samples/sec Loss 4.0811 LearningRate 0.0003 Epoch: 37 Global Step: 215930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:51,925-Speed 11419.63 samples/sec Loss 4.0468 LearningRate 0.0003 Epoch: 37 Global Step: 215940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:52,843-Speed 11159.19 samples/sec Loss 4.0686 LearningRate 0.0003 Epoch: 37 Global Step: 215950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:53,763-Speed 11135.52 samples/sec Loss 4.0604 LearningRate 0.0003 Epoch: 37 Global Step: 215960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:54,723-Speed 10683.29 samples/sec Loss 4.0225 LearningRate 0.0003 Epoch: 37 Global Step: 215970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:55,659-Speed 10940.72 samples/sec Loss 4.0499 LearningRate 0.0003 Epoch: 37 Global Step: 215980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:56,588-Speed 11026.00 samples/sec Loss 4.0787 LearningRate 0.0003 Epoch: 37 Global Step: 215990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:54:57,528-Speed 10898.56 samples/sec Loss 3.9773 LearningRate 0.0003 Epoch: 37 Global Step: 216000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:55:18,796-[lfw][216000]XNorm: 7.628121 Training: 2022-04-11 06:55:18,797-[lfw][216000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-04-11 06:55:18,797-[lfw][216000]Accuracy-Highest: 0.99750 Training: 2022-04-11 06:55:43,419-[cfp_fp][216000]XNorm: 6.538678 Training: 2022-04-11 06:55:43,419-[cfp_fp][216000]Accuracy-Flip: 0.95643+-0.01109 Training: 2022-04-11 06:55:43,420-[cfp_fp][216000]Accuracy-Highest: 0.95943 Training: 2022-04-11 06:56:04,700-[agedb_30][216000]XNorm: 7.476653 Training: 2022-04-11 06:56:04,701-[agedb_30][216000]Accuracy-Flip: 0.96983+-0.00736 Training: 2022-04-11 06:56:04,701-[agedb_30][216000]Accuracy-Highest: 0.97183 Training: 2022-04-11 06:56:05,640-Speed 150.34 samples/sec Loss 3.9947 LearningRate 0.0003 Epoch: 37 Global Step: 216010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:06,620-Speed 10455.10 samples/sec Loss 4.1811 LearningRate 0.0003 Epoch: 37 Global Step: 216020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:07,601-Speed 10449.74 samples/sec Loss 4.0892 LearningRate 0.0003 Epoch: 37 Global Step: 216030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:08,571-Speed 10561.58 samples/sec Loss 4.1725 LearningRate 0.0003 Epoch: 37 Global Step: 216040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:09,461-Speed 11507.51 samples/sec Loss 3.9388 LearningRate 0.0003 Epoch: 37 Global Step: 216050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:10,439-Speed 10482.15 samples/sec Loss 4.0272 LearningRate 0.0003 Epoch: 37 Global Step: 216060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:21,363-Speed 937.45 samples/sec Loss 4.1484 LearningRate 0.0002 Epoch: 38 Global Step: 216070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:22,527-Speed 8802.44 samples/sec Loss 4.0120 LearningRate 0.0002 Epoch: 38 Global Step: 216080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:23,535-Speed 10167.89 samples/sec Loss 4.0429 LearningRate 0.0002 Epoch: 38 Global Step: 216090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:24,575-Speed 9852.04 samples/sec Loss 3.9991 LearningRate 0.0002 Epoch: 38 Global Step: 216100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:25,544-Speed 10580.01 samples/sec Loss 4.0399 LearningRate 0.0002 Epoch: 38 Global Step: 216110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:26,507-Speed 10642.12 samples/sec Loss 3.9750 LearningRate 0.0002 Epoch: 38 Global Step: 216120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:27,503-Speed 10288.15 samples/sec Loss 4.0938 LearningRate 0.0002 Epoch: 38 Global Step: 216130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:28,464-Speed 10666.22 samples/sec Loss 4.0737 LearningRate 0.0002 Epoch: 38 Global Step: 216140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:29,451-Speed 10378.85 samples/sec Loss 3.9101 LearningRate 0.0002 Epoch: 38 Global Step: 216150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:30,361-Speed 11258.43 samples/sec Loss 3.8431 LearningRate 0.0002 Epoch: 38 Global Step: 216160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:31,311-Speed 10793.99 samples/sec Loss 4.0018 LearningRate 0.0002 Epoch: 38 Global Step: 216170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:32,254-Speed 10859.70 samples/sec Loss 3.9041 LearningRate 0.0002 Epoch: 38 Global Step: 216180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:33,236-Speed 10441.64 samples/sec Loss 3.8832 LearningRate 0.0002 Epoch: 38 Global Step: 216190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:34,154-Speed 11159.08 samples/sec Loss 4.0356 LearningRate 0.0002 Epoch: 38 Global Step: 216200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:35,086-Speed 10990.81 samples/sec Loss 3.9305 LearningRate 0.0002 Epoch: 38 Global Step: 216210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:36,019-Speed 10988.20 samples/sec Loss 4.1271 LearningRate 0.0002 Epoch: 38 Global Step: 216220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:36,981-Speed 10645.85 samples/sec Loss 4.0481 LearningRate 0.0002 Epoch: 38 Global Step: 216230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:37,919-Speed 10928.64 samples/sec Loss 3.9119 LearningRate 0.0002 Epoch: 38 Global Step: 216240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:38,942-Speed 10014.43 samples/sec Loss 3.9952 LearningRate 0.0002 Epoch: 38 Global Step: 216250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:39,950-Speed 10169.25 samples/sec Loss 4.0334 LearningRate 0.0002 Epoch: 38 Global Step: 216260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:40,924-Speed 10527.77 samples/sec Loss 4.0597 LearningRate 0.0002 Epoch: 38 Global Step: 216270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:41,922-Speed 10264.18 samples/sec Loss 4.0317 LearningRate 0.0002 Epoch: 38 Global Step: 216280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:42,860-Speed 10921.26 samples/sec Loss 3.9049 LearningRate 0.0002 Epoch: 38 Global Step: 216290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:43,877-Speed 10076.97 samples/sec Loss 4.0576 LearningRate 0.0002 Epoch: 38 Global Step: 216300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:56:44,805-Speed 11048.39 samples/sec Loss 4.0019 LearningRate 0.0002 Epoch: 38 Global Step: 216310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:45,716-Speed 11245.62 samples/sec Loss 4.0536 LearningRate 0.0002 Epoch: 38 Global Step: 216320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:46,628-Speed 11238.74 samples/sec Loss 3.8486 LearningRate 0.0002 Epoch: 38 Global Step: 216330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:47,555-Speed 11054.58 samples/sec Loss 4.0926 LearningRate 0.0002 Epoch: 38 Global Step: 216340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:48,462-Speed 11289.40 samples/sec Loss 3.9803 LearningRate 0.0002 Epoch: 38 Global Step: 216350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:49,396-Speed 10976.44 samples/sec Loss 3.9384 LearningRate 0.0002 Epoch: 38 Global Step: 216360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:50,350-Speed 10733.89 samples/sec Loss 3.9737 LearningRate 0.0002 Epoch: 38 Global Step: 216370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:51,319-Speed 10574.32 samples/sec Loss 4.0306 LearningRate 0.0002 Epoch: 38 Global Step: 216380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:56:52,281-Speed 10652.33 samples/sec Loss 4.0677 LearningRate 0.0002 Epoch: 38 Global Step: 216390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:53,243-Speed 10649.24 samples/sec Loss 3.9585 LearningRate 0.0002 Epoch: 38 Global Step: 216400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:54,216-Speed 10539.87 samples/sec Loss 3.9329 LearningRate 0.0002 Epoch: 38 Global Step: 216410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:55,168-Speed 10765.37 samples/sec Loss 3.9067 LearningRate 0.0002 Epoch: 38 Global Step: 216420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:56,137-Speed 10575.12 samples/sec Loss 4.0534 LearningRate 0.0002 Epoch: 38 Global Step: 216430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:57,072-Speed 10958.53 samples/sec Loss 4.0406 LearningRate 0.0002 Epoch: 38 Global Step: 216440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:58,048-Speed 10493.94 samples/sec Loss 3.9601 LearningRate 0.0002 Epoch: 38 Global Step: 216450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:56:59,019-Speed 10565.06 samples/sec Loss 4.0473 LearningRate 0.0002 Epoch: 38 Global Step: 216460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:00,040-Speed 10031.29 samples/sec Loss 3.9543 LearningRate 0.0002 Epoch: 38 Global Step: 216470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:01,036-Speed 10286.36 samples/sec Loss 3.9464 LearningRate 0.0002 Epoch: 38 Global Step: 216480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:02,004-Speed 10588.15 samples/sec Loss 3.9970 LearningRate 0.0002 Epoch: 38 Global Step: 216490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:02,980-Speed 10501.47 samples/sec Loss 4.0105 LearningRate 0.0002 Epoch: 38 Global Step: 216500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:03,957-Speed 10490.84 samples/sec Loss 3.9283 LearningRate 0.0002 Epoch: 38 Global Step: 216510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:04,920-Speed 10637.27 samples/sec Loss 4.0371 LearningRate 0.0002 Epoch: 38 Global Step: 216520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:05,862-Speed 10872.16 samples/sec Loss 3.9362 LearningRate 0.0002 Epoch: 38 Global Step: 216530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:06,827-Speed 10620.32 samples/sec Loss 4.0416 LearningRate 0.0002 Epoch: 38 Global Step: 216540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:07,776-Speed 10796.21 samples/sec Loss 4.0006 LearningRate 0.0002 Epoch: 38 Global Step: 216550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:08,692-Speed 11182.28 samples/sec Loss 4.0415 LearningRate 0.0002 Epoch: 38 Global Step: 216560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:09,595-Speed 11350.40 samples/sec Loss 4.0570 LearningRate 0.0002 Epoch: 38 Global Step: 216570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:10,553-Speed 10702.70 samples/sec Loss 4.0740 LearningRate 0.0002 Epoch: 38 Global Step: 216580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:11,487-Speed 10965.74 samples/sec Loss 4.0303 LearningRate 0.0002 Epoch: 38 Global Step: 216590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:12,426-Speed 10917.72 samples/sec Loss 4.0189 LearningRate 0.0002 Epoch: 38 Global Step: 216600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:13,369-Speed 10867.84 samples/sec Loss 4.0473 LearningRate 0.0002 Epoch: 38 Global Step: 216610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:14,325-Speed 10717.33 samples/sec Loss 4.0815 LearningRate 0.0002 Epoch: 38 Global Step: 216620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:15,304-Speed 10466.21 samples/sec Loss 3.9814 LearningRate 0.0002 Epoch: 38 Global Step: 216630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:16,257-Speed 10761.44 samples/sec Loss 3.9916 LearningRate 0.0002 Epoch: 38 Global Step: 216640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:17,172-Speed 11193.81 samples/sec Loss 4.0847 LearningRate 0.0002 Epoch: 38 Global Step: 216650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:18,123-Speed 10778.18 samples/sec Loss 3.9941 LearningRate 0.0002 Epoch: 38 Global Step: 216660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:19,085-Speed 10653.14 samples/sec Loss 3.9222 LearningRate 0.0002 Epoch: 38 Global Step: 216670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:20,043-Speed 10695.10 samples/sec Loss 3.9230 LearningRate 0.0002 Epoch: 38 Global Step: 216680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:20,956-Speed 11221.41 samples/sec Loss 4.0461 LearningRate 0.0002 Epoch: 38 Global Step: 216690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:21,891-Speed 10964.03 samples/sec Loss 3.9515 LearningRate 0.0002 Epoch: 38 Global Step: 216700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:22,875-Speed 10405.02 samples/sec Loss 4.0657 LearningRate 0.0002 Epoch: 38 Global Step: 216710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:23,788-Speed 11233.04 samples/sec Loss 3.9256 LearningRate 0.0002 Epoch: 38 Global Step: 216720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:24,780-Speed 10324.09 samples/sec Loss 4.1610 LearningRate 0.0002 Epoch: 38 Global Step: 216730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:25,711-Speed 11015.13 samples/sec Loss 4.0537 LearningRate 0.0002 Epoch: 38 Global Step: 216740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:26,678-Speed 10590.08 samples/sec Loss 4.0793 LearningRate 0.0002 Epoch: 38 Global Step: 216750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:27,650-Speed 10545.38 samples/sec Loss 3.9762 LearningRate 0.0002 Epoch: 38 Global Step: 216760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:28,642-Speed 10330.70 samples/sec Loss 4.0484 LearningRate 0.0002 Epoch: 38 Global Step: 216770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:29,710-Speed 9594.16 samples/sec Loss 4.0546 LearningRate 0.0002 Epoch: 38 Global Step: 216780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:30,687-Speed 10495.52 samples/sec Loss 3.9291 LearningRate 0.0002 Epoch: 38 Global Step: 216790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:31,706-Speed 10050.34 samples/sec Loss 4.0073 LearningRate 0.0002 Epoch: 38 Global Step: 216800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:32,625-Speed 11149.38 samples/sec Loss 3.9024 LearningRate 0.0002 Epoch: 38 Global Step: 216810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:33,563-Speed 10920.02 samples/sec Loss 3.9280 LearningRate 0.0002 Epoch: 38 Global Step: 216820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:34,487-Speed 11101.46 samples/sec Loss 3.8325 LearningRate 0.0002 Epoch: 38 Global Step: 216830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:35,393-Speed 11308.40 samples/sec Loss 4.0101 LearningRate 0.0002 Epoch: 38 Global Step: 216840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:36,314-Speed 11124.69 samples/sec Loss 4.0491 LearningRate 0.0002 Epoch: 38 Global Step: 216850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:57:37,226-Speed 11237.81 samples/sec Loss 4.0312 LearningRate 0.0002 Epoch: 38 Global Step: 216860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:38,173-Speed 10814.78 samples/sec Loss 3.9728 LearningRate 0.0002 Epoch: 38 Global Step: 216870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:39,108-Speed 10971.40 samples/sec Loss 4.0029 LearningRate 0.0002 Epoch: 38 Global Step: 216880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:40,028-Speed 11133.64 samples/sec Loss 3.9621 LearningRate 0.0002 Epoch: 38 Global Step: 216890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:40,958-Speed 11012.86 samples/sec Loss 4.0295 LearningRate 0.0002 Epoch: 38 Global Step: 216900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:41,924-Speed 10617.47 samples/sec Loss 3.9208 LearningRate 0.0002 Epoch: 38 Global Step: 216910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:42,911-Speed 10375.50 samples/sec Loss 4.0509 LearningRate 0.0002 Epoch: 38 Global Step: 216920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:43,866-Speed 10734.77 samples/sec Loss 3.9646 LearningRate 0.0002 Epoch: 38 Global Step: 216930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:44,792-Speed 11057.07 samples/sec Loss 3.9644 LearningRate 0.0002 Epoch: 38 Global Step: 216940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:45,720-Speed 11043.70 samples/sec Loss 3.9642 LearningRate 0.0002 Epoch: 38 Global Step: 216950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:46,635-Speed 11205.24 samples/sec Loss 3.9462 LearningRate 0.0002 Epoch: 38 Global Step: 216960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:47,532-Speed 11422.61 samples/sec Loss 4.0434 LearningRate 0.0002 Epoch: 38 Global Step: 216970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:48,489-Speed 10704.94 samples/sec Loss 3.9290 LearningRate 0.0002 Epoch: 38 Global Step: 216980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:49,495-Speed 10184.96 samples/sec Loss 3.8670 LearningRate 0.0002 Epoch: 38 Global Step: 216990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 06:57:50,465-Speed 10558.62 samples/sec Loss 3.8398 LearningRate 0.0002 Epoch: 38 Global Step: 217000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:51,434-Speed 10587.26 samples/sec Loss 4.0107 LearningRate 0.0002 Epoch: 38 Global Step: 217010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:52,415-Speed 10439.27 samples/sec Loss 4.0367 LearningRate 0.0002 Epoch: 38 Global Step: 217020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:53,346-Speed 11007.19 samples/sec Loss 3.9793 LearningRate 0.0002 Epoch: 38 Global Step: 217030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:54,255-Speed 11275.64 samples/sec Loss 4.1589 LearningRate 0.0002 Epoch: 38 Global Step: 217040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:55,201-Speed 10831.56 samples/sec Loss 3.9736 LearningRate 0.0002 Epoch: 38 Global Step: 217050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:56,160-Speed 10684.41 samples/sec Loss 3.9055 LearningRate 0.0002 Epoch: 38 Global Step: 217060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:57,065-Speed 11318.56 samples/sec Loss 3.9510 LearningRate 0.0002 Epoch: 38 Global Step: 217070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:57,960-Speed 11453.91 samples/sec Loss 4.0572 LearningRate 0.0002 Epoch: 38 Global Step: 217080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:58,896-Speed 10948.15 samples/sec Loss 4.0077 LearningRate 0.0002 Epoch: 38 Global Step: 217090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:57:59,820-Speed 11091.20 samples/sec Loss 3.8057 LearningRate 0.0002 Epoch: 38 Global Step: 217100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:00,727-Speed 11302.90 samples/sec Loss 4.0511 LearningRate 0.0002 Epoch: 38 Global Step: 217110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:01,669-Speed 10870.34 samples/sec Loss 3.9931 LearningRate 0.0002 Epoch: 38 Global Step: 217120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:02,617-Speed 10812.28 samples/sec Loss 3.9422 LearningRate 0.0002 Epoch: 38 Global Step: 217130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:03,591-Speed 10517.79 samples/sec Loss 4.0307 LearningRate 0.0002 Epoch: 38 Global Step: 217140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:04,583-Speed 10329.85 samples/sec Loss 3.9024 LearningRate 0.0002 Epoch: 38 Global Step: 217150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:05,535-Speed 10762.26 samples/sec Loss 3.9070 LearningRate 0.0002 Epoch: 38 Global Step: 217160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:06,473-Speed 10925.62 samples/sec Loss 4.0020 LearningRate 0.0002 Epoch: 38 Global Step: 217170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:07,454-Speed 10450.88 samples/sec Loss 3.9599 LearningRate 0.0002 Epoch: 38 Global Step: 217180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:08,399-Speed 10833.98 samples/sec Loss 3.9513 LearningRate 0.0002 Epoch: 38 Global Step: 217190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:09,319-Speed 11151.68 samples/sec Loss 4.0149 LearningRate 0.0002 Epoch: 38 Global Step: 217200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:10,227-Speed 11277.72 samples/sec Loss 3.9583 LearningRate 0.0002 Epoch: 38 Global Step: 217210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:11,183-Speed 10718.39 samples/sec Loss 4.0260 LearningRate 0.0002 Epoch: 38 Global Step: 217220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:12,180-Speed 10284.70 samples/sec Loss 4.0155 LearningRate 0.0002 Epoch: 38 Global Step: 217230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:13,188-Speed 10163.03 samples/sec Loss 4.0269 LearningRate 0.0002 Epoch: 38 Global Step: 217240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:14,142-Speed 10741.24 samples/sec Loss 3.9572 LearningRate 0.0002 Epoch: 38 Global Step: 217250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:15,124-Speed 10432.78 samples/sec Loss 4.0006 LearningRate 0.0002 Epoch: 38 Global Step: 217260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:16,101-Speed 10486.21 samples/sec Loss 3.8801 LearningRate 0.0002 Epoch: 38 Global Step: 217270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:17,097-Speed 10292.48 samples/sec Loss 4.0584 LearningRate 0.0002 Epoch: 38 Global Step: 217280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:18,090-Speed 10312.90 samples/sec Loss 3.8974 LearningRate 0.0002 Epoch: 38 Global Step: 217290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:19,056-Speed 10620.29 samples/sec Loss 3.8944 LearningRate 0.0002 Epoch: 38 Global Step: 217300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:20,024-Speed 10585.69 samples/sec Loss 4.0443 LearningRate 0.0002 Epoch: 38 Global Step: 217310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:20,949-Speed 11082.26 samples/sec Loss 3.9971 LearningRate 0.0002 Epoch: 38 Global Step: 217320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:21,854-Speed 11323.37 samples/sec Loss 3.9760 LearningRate 0.0002 Epoch: 38 Global Step: 217330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:22,767-Speed 11222.04 samples/sec Loss 3.9494 LearningRate 0.0002 Epoch: 38 Global Step: 217340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:23,704-Speed 10931.32 samples/sec Loss 4.0652 LearningRate 0.0002 Epoch: 38 Global Step: 217350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:24,632-Speed 11048.93 samples/sec Loss 4.0086 LearningRate 0.0002 Epoch: 38 Global Step: 217360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:25,595-Speed 10639.98 samples/sec Loss 4.1565 LearningRate 0.0002 Epoch: 38 Global Step: 217370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:26,558-Speed 10634.76 samples/sec Loss 4.0924 LearningRate 0.0002 Epoch: 38 Global Step: 217380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:27,498-Speed 10899.44 samples/sec Loss 3.9118 LearningRate 0.0002 Epoch: 38 Global Step: 217390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:28,427-Speed 11037.03 samples/sec Loss 4.0570 LearningRate 0.0002 Epoch: 38 Global Step: 217400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:29,362-Speed 10957.53 samples/sec Loss 3.8069 LearningRate 0.0002 Epoch: 38 Global Step: 217410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:30,288-Speed 11076.11 samples/sec Loss 3.9098 LearningRate 0.0002 Epoch: 38 Global Step: 217420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:31,236-Speed 10807.70 samples/sec Loss 4.0127 LearningRate 0.0002 Epoch: 38 Global Step: 217430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:32,179-Speed 10861.29 samples/sec Loss 3.9431 LearningRate 0.0002 Epoch: 38 Global Step: 217440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:33,107-Speed 11048.32 samples/sec Loss 3.9341 LearningRate 0.0002 Epoch: 38 Global Step: 217450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:34,032-Speed 11081.48 samples/sec Loss 4.0732 LearningRate 0.0002 Epoch: 38 Global Step: 217460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:34,978-Speed 10831.81 samples/sec Loss 4.0355 LearningRate 0.0002 Epoch: 38 Global Step: 217470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:35,908-Speed 11008.67 samples/sec Loss 4.0178 LearningRate 0.0002 Epoch: 38 Global Step: 217480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:36,873-Speed 10620.60 samples/sec Loss 4.1240 LearningRate 0.0002 Epoch: 38 Global Step: 217490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:37,853-Speed 10459.04 samples/sec Loss 3.9598 LearningRate 0.0002 Epoch: 38 Global Step: 217500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:38,823-Speed 10561.83 samples/sec Loss 3.9710 LearningRate 0.0002 Epoch: 38 Global Step: 217510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:39,766-Speed 10868.80 samples/sec Loss 3.9995 LearningRate 0.0002 Epoch: 38 Global Step: 217520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:40,695-Speed 11032.03 samples/sec Loss 4.0519 LearningRate 0.0002 Epoch: 38 Global Step: 217530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:41,638-Speed 10857.81 samples/sec Loss 4.0158 LearningRate 0.0002 Epoch: 38 Global Step: 217540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:42,634-Speed 10294.96 samples/sec Loss 4.0036 LearningRate 0.0002 Epoch: 38 Global Step: 217550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:43,597-Speed 10642.14 samples/sec Loss 3.9934 LearningRate 0.0002 Epoch: 38 Global Step: 217560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:44,510-Speed 11235.48 samples/sec Loss 3.9288 LearningRate 0.0002 Epoch: 38 Global Step: 217570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:45,415-Speed 11315.58 samples/sec Loss 4.0933 LearningRate 0.0002 Epoch: 38 Global Step: 217580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:46,338-Speed 11107.28 samples/sec Loss 3.9127 LearningRate 0.0002 Epoch: 38 Global Step: 217590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:47,248-Speed 11253.58 samples/sec Loss 4.0357 LearningRate 0.0002 Epoch: 38 Global Step: 217600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:48,231-Speed 10430.33 samples/sec Loss 3.9381 LearningRate 0.0002 Epoch: 38 Global Step: 217610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:49,188-Speed 10701.64 samples/sec Loss 3.9233 LearningRate 0.0002 Epoch: 38 Global Step: 217620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:50,143-Speed 10733.67 samples/sec Loss 4.0786 LearningRate 0.0002 Epoch: 38 Global Step: 217630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:51,088-Speed 10842.45 samples/sec Loss 4.0715 LearningRate 0.0002 Epoch: 38 Global Step: 217640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:52,104-Speed 10083.82 samples/sec Loss 3.9970 LearningRate 0.0002 Epoch: 38 Global Step: 217650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:53,066-Speed 10648.39 samples/sec Loss 3.8469 LearningRate 0.0002 Epoch: 38 Global Step: 217660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:54,045-Speed 10473.63 samples/sec Loss 3.9715 LearningRate 0.0002 Epoch: 38 Global Step: 217670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:54,991-Speed 10831.86 samples/sec Loss 3.9255 LearningRate 0.0002 Epoch: 38 Global Step: 217680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:55,927-Speed 10939.06 samples/sec Loss 4.0673 LearningRate 0.0002 Epoch: 38 Global Step: 217690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:58:56,873-Speed 10838.44 samples/sec Loss 4.0606 LearningRate 0.0002 Epoch: 38 Global Step: 217700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:57,845-Speed 10541.37 samples/sec Loss 4.0214 LearningRate 0.0002 Epoch: 38 Global Step: 217710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:58,784-Speed 10910.79 samples/sec Loss 3.9226 LearningRate 0.0002 Epoch: 38 Global Step: 217720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:58:59,752-Speed 10584.44 samples/sec Loss 4.0361 LearningRate 0.0002 Epoch: 38 Global Step: 217730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:59:00,767-Speed 10093.74 samples/sec Loss 3.9035 LearningRate 0.0002 Epoch: 38 Global Step: 217740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:59:01,748-Speed 10441.64 samples/sec Loss 3.9204 LearningRate 0.0002 Epoch: 38 Global Step: 217750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:59:02,724-Speed 10506.94 samples/sec Loss 3.8889 LearningRate 0.0002 Epoch: 38 Global Step: 217760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:59:03,705-Speed 10436.27 samples/sec Loss 4.0191 LearningRate 0.0002 Epoch: 38 Global Step: 217770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:59:04,702-Speed 10284.89 samples/sec Loss 3.9504 LearningRate 0.0002 Epoch: 38 Global Step: 217780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:05,633-Speed 11004.33 samples/sec Loss 3.9869 LearningRate 0.0002 Epoch: 38 Global Step: 217790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:06,598-Speed 10619.86 samples/sec Loss 3.9285 LearningRate 0.0002 Epoch: 38 Global Step: 217800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:07,511-Speed 11229.82 samples/sec Loss 4.0593 LearningRate 0.0002 Epoch: 38 Global Step: 217810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:08,436-Speed 11069.53 samples/sec Loss 3.9175 LearningRate 0.0002 Epoch: 38 Global Step: 217820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:09,379-Speed 10876.13 samples/sec Loss 3.9590 LearningRate 0.0002 Epoch: 38 Global Step: 217830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:10,312-Speed 10977.34 samples/sec Loss 3.9826 LearningRate 0.0002 Epoch: 38 Global Step: 217840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:11,239-Speed 11055.94 samples/sec Loss 4.0175 LearningRate 0.0002 Epoch: 38 Global Step: 217850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:12,182-Speed 10861.44 samples/sec Loss 4.0424 LearningRate 0.0002 Epoch: 38 Global Step: 217860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:13,108-Speed 11073.57 samples/sec Loss 3.9942 LearningRate 0.0002 Epoch: 38 Global Step: 217870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:14,079-Speed 10556.02 samples/sec Loss 3.8563 LearningRate 0.0002 Epoch: 38 Global Step: 217880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:15,022-Speed 10863.77 samples/sec Loss 3.8982 LearningRate 0.0002 Epoch: 38 Global Step: 217890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:15,985-Speed 10643.03 samples/sec Loss 3.9619 LearningRate 0.0002 Epoch: 38 Global Step: 217900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:16,915-Speed 11007.73 samples/sec Loss 3.9022 LearningRate 0.0002 Epoch: 38 Global Step: 217910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:17,868-Speed 10759.61 samples/sec Loss 4.0207 LearningRate 0.0002 Epoch: 38 Global Step: 217920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:18,834-Speed 10605.59 samples/sec Loss 3.9371 LearningRate 0.0002 Epoch: 38 Global Step: 217930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:19,789-Speed 10724.80 samples/sec Loss 3.9672 LearningRate 0.0002 Epoch: 38 Global Step: 217940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:20,715-Speed 11071.85 samples/sec Loss 4.0723 LearningRate 0.0002 Epoch: 38 Global Step: 217950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:21,674-Speed 10680.52 samples/sec Loss 4.0807 LearningRate 0.0002 Epoch: 38 Global Step: 217960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:22,601-Speed 11061.51 samples/sec Loss 4.1169 LearningRate 0.0002 Epoch: 38 Global Step: 217970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:23,567-Speed 10599.60 samples/sec Loss 4.0210 LearningRate 0.0002 Epoch: 38 Global Step: 217980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 06:59:24,534-Speed 10605.51 samples/sec Loss 3.9884 LearningRate 0.0002 Epoch: 38 Global Step: 217990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:25,487-Speed 10750.73 samples/sec Loss 4.0494 LearningRate 0.0002 Epoch: 38 Global Step: 218000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 06:59:46,710-[lfw][218000]XNorm: 7.625579 Training: 2022-04-11 06:59:46,711-[lfw][218000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-11 06:59:46,711-[lfw][218000]Accuracy-Highest: 0.99750 Training: 2022-04-11 07:00:11,218-[cfp_fp][218000]XNorm: 6.529426 Training: 2022-04-11 07:00:11,219-[cfp_fp][218000]Accuracy-Flip: 0.95529+-0.01139 Training: 2022-04-11 07:00:11,219-[cfp_fp][218000]Accuracy-Highest: 0.95943 Training: 2022-04-11 07:00:32,422-[agedb_30][218000]XNorm: 7.478717 Training: 2022-04-11 07:00:32,423-[agedb_30][218000]Accuracy-Flip: 0.97050+-0.00757 Training: 2022-04-11 07:00:32,423-[agedb_30][218000]Accuracy-Highest: 0.97183 Training: 2022-04-11 07:00:33,355-Speed 150.88 samples/sec Loss 3.9445 LearningRate 0.0002 Epoch: 38 Global Step: 218010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:34,277-Speed 11122.72 samples/sec Loss 4.1197 LearningRate 0.0002 Epoch: 38 Global Step: 218020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:35,216-Speed 10904.62 samples/sec Loss 4.0047 LearningRate 0.0002 Epoch: 38 Global Step: 218030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:36,146-Speed 11021.72 samples/sec Loss 4.0382 LearningRate 0.0002 Epoch: 38 Global Step: 218040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:37,102-Speed 10715.72 samples/sec Loss 4.0222 LearningRate 0.0002 Epoch: 38 Global Step: 218050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:38,038-Speed 10952.41 samples/sec Loss 4.0630 LearningRate 0.0002 Epoch: 38 Global Step: 218060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:39,011-Speed 10528.64 samples/sec Loss 4.0231 LearningRate 0.0002 Epoch: 38 Global Step: 218070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:40,012-Speed 10238.70 samples/sec Loss 3.9213 LearningRate 0.0002 Epoch: 38 Global Step: 218080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:40,969-Speed 10698.77 samples/sec Loss 3.9311 LearningRate 0.0002 Epoch: 38 Global Step: 218090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:00:41,990-Speed 10071.98 samples/sec Loss 3.9660 LearningRate 0.0002 Epoch: 38 Global Step: 218100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:00:42,975-Speed 10401.76 samples/sec Loss 4.0269 LearningRate 0.0002 Epoch: 38 Global Step: 218110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:43,938-Speed 10638.12 samples/sec Loss 4.0782 LearningRate 0.0002 Epoch: 38 Global Step: 218120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:44,885-Speed 10815.07 samples/sec Loss 4.0669 LearningRate 0.0002 Epoch: 38 Global Step: 218130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:45,809-Speed 11086.86 samples/sec Loss 3.8781 LearningRate 0.0002 Epoch: 38 Global Step: 218140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:46,724-Speed 11210.13 samples/sec Loss 3.9280 LearningRate 0.0002 Epoch: 38 Global Step: 218150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:47,659-Speed 10955.16 samples/sec Loss 4.0832 LearningRate 0.0002 Epoch: 38 Global Step: 218160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:48,625-Speed 10608.43 samples/sec Loss 3.9865 LearningRate 0.0002 Epoch: 38 Global Step: 218170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:49,599-Speed 10516.00 samples/sec Loss 4.0669 LearningRate 0.0002 Epoch: 38 Global Step: 218180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:50,597-Speed 10274.66 samples/sec Loss 3.9687 LearningRate 0.0002 Epoch: 38 Global Step: 218190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:51,600-Speed 10213.89 samples/sec Loss 3.9443 LearningRate 0.0002 Epoch: 38 Global Step: 218200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:52,567-Speed 10595.23 samples/sec Loss 3.9760 LearningRate 0.0002 Epoch: 38 Global Step: 218210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:00:53,498-Speed 11003.29 samples/sec Loss 3.9141 LearningRate 0.0002 Epoch: 38 Global Step: 218220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:54,455-Speed 10714.92 samples/sec Loss 4.0326 LearningRate 0.0002 Epoch: 38 Global Step: 218230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:55,392-Speed 10932.52 samples/sec Loss 4.0143 LearningRate 0.0002 Epoch: 38 Global Step: 218240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:56,340-Speed 10804.67 samples/sec Loss 3.9910 LearningRate 0.0002 Epoch: 38 Global Step: 218250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:57,278-Speed 10925.04 samples/sec Loss 4.0303 LearningRate 0.0002 Epoch: 38 Global Step: 218260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:58,239-Speed 10669.02 samples/sec Loss 3.9303 LearningRate 0.0002 Epoch: 38 Global Step: 218270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:00:59,164-Speed 11068.64 samples/sec Loss 4.0949 LearningRate 0.0002 Epoch: 38 Global Step: 218280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:00,160-Speed 10285.79 samples/sec Loss 3.9616 LearningRate 0.0002 Epoch: 38 Global Step: 218290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:01,109-Speed 10797.49 samples/sec Loss 4.0282 LearningRate 0.0002 Epoch: 38 Global Step: 218300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:02,092-Speed 10432.91 samples/sec Loss 4.0496 LearningRate 0.0002 Epoch: 38 Global Step: 218310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:03,037-Speed 10844.85 samples/sec Loss 4.0057 LearningRate 0.0002 Epoch: 38 Global Step: 218320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:03,982-Speed 10839.42 samples/sec Loss 4.0396 LearningRate 0.0002 Epoch: 38 Global Step: 218330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:04,970-Speed 10371.40 samples/sec Loss 4.0277 LearningRate 0.0002 Epoch: 38 Global Step: 218340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:05,923-Speed 10753.61 samples/sec Loss 3.9482 LearningRate 0.0002 Epoch: 38 Global Step: 218350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:06,885-Speed 10656.36 samples/sec Loss 3.9064 LearningRate 0.0002 Epoch: 38 Global Step: 218360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:07,889-Speed 10205.84 samples/sec Loss 4.1041 LearningRate 0.0002 Epoch: 38 Global Step: 218370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:08,838-Speed 10800.30 samples/sec Loss 4.0569 LearningRate 0.0002 Epoch: 38 Global Step: 218380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:09,783-Speed 10842.70 samples/sec Loss 4.0774 LearningRate 0.0002 Epoch: 38 Global Step: 218390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:10,697-Speed 11210.90 samples/sec Loss 3.8875 LearningRate 0.0002 Epoch: 38 Global Step: 218400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:11,629-Speed 10988.36 samples/sec Loss 4.0631 LearningRate 0.0002 Epoch: 38 Global Step: 218410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:12,589-Speed 10681.38 samples/sec Loss 4.0135 LearningRate 0.0002 Epoch: 38 Global Step: 218420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:13,571-Speed 10433.23 samples/sec Loss 4.0138 LearningRate 0.0002 Epoch: 38 Global Step: 218430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:14,526-Speed 10720.80 samples/sec Loss 4.0307 LearningRate 0.0002 Epoch: 38 Global Step: 218440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:15,468-Speed 10886.42 samples/sec Loss 4.1183 LearningRate 0.0002 Epoch: 38 Global Step: 218450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:16,427-Speed 10675.83 samples/sec Loss 4.0743 LearningRate 0.0002 Epoch: 38 Global Step: 218460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:17,368-Speed 10893.32 samples/sec Loss 3.9568 LearningRate 0.0002 Epoch: 38 Global Step: 218470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:18,363-Speed 10297.70 samples/sec Loss 4.1716 LearningRate 0.0002 Epoch: 38 Global Step: 218480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:19,321-Speed 10704.05 samples/sec Loss 3.9870 LearningRate 0.0002 Epoch: 38 Global Step: 218490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:20,264-Speed 10867.73 samples/sec Loss 3.9813 LearningRate 0.0002 Epoch: 38 Global Step: 218500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:21,195-Speed 10998.18 samples/sec Loss 4.0682 LearningRate 0.0002 Epoch: 38 Global Step: 218510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:22,126-Speed 11009.15 samples/sec Loss 4.0135 LearningRate 0.0002 Epoch: 38 Global Step: 218520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:23,056-Speed 11028.31 samples/sec Loss 4.0445 LearningRate 0.0002 Epoch: 38 Global Step: 218530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:23,993-Speed 10925.48 samples/sec Loss 4.0868 LearningRate 0.0002 Epoch: 38 Global Step: 218540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:24,923-Speed 11029.27 samples/sec Loss 4.0821 LearningRate 0.0002 Epoch: 38 Global Step: 218550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:25,903-Speed 10448.91 samples/sec Loss 3.9825 LearningRate 0.0002 Epoch: 38 Global Step: 218560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:26,861-Speed 10711.81 samples/sec Loss 4.0518 LearningRate 0.0002 Epoch: 38 Global Step: 218570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:27,884-Speed 10013.29 samples/sec Loss 3.8919 LearningRate 0.0002 Epoch: 38 Global Step: 218580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:28,835-Speed 10769.01 samples/sec Loss 4.0782 LearningRate 0.0002 Epoch: 38 Global Step: 218590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:29,808-Speed 10534.95 samples/sec Loss 4.0590 LearningRate 0.0002 Epoch: 38 Global Step: 218600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:30,776-Speed 10587.39 samples/sec Loss 4.0068 LearningRate 0.0002 Epoch: 38 Global Step: 218610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:31,695-Speed 11154.01 samples/sec Loss 3.9823 LearningRate 0.0002 Epoch: 38 Global Step: 218620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:32,610-Speed 11192.26 samples/sec Loss 3.9360 LearningRate 0.0002 Epoch: 38 Global Step: 218630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:33,557-Speed 10823.45 samples/sec Loss 3.9999 LearningRate 0.0001 Epoch: 38 Global Step: 218640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:34,486-Speed 11024.69 samples/sec Loss 3.9504 LearningRate 0.0001 Epoch: 38 Global Step: 218650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:35,410-Speed 11088.24 samples/sec Loss 4.0272 LearningRate 0.0001 Epoch: 38 Global Step: 218660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:36,355-Speed 10848.73 samples/sec Loss 3.9528 LearningRate 0.0001 Epoch: 38 Global Step: 218670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:37,281-Speed 11065.82 samples/sec Loss 4.0160 LearningRate 0.0001 Epoch: 38 Global Step: 218680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:38,219-Speed 10922.08 samples/sec Loss 4.0017 LearningRate 0.0001 Epoch: 38 Global Step: 218690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:39,226-Speed 10178.98 samples/sec Loss 3.9112 LearningRate 0.0001 Epoch: 38 Global Step: 218700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:40,216-Speed 10349.75 samples/sec Loss 4.0462 LearningRate 0.0001 Epoch: 38 Global Step: 218710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:41,171-Speed 10729.68 samples/sec Loss 4.0800 LearningRate 0.0001 Epoch: 38 Global Step: 218720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:42,108-Speed 10945.00 samples/sec Loss 4.0833 LearningRate 0.0001 Epoch: 38 Global Step: 218730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:43,105-Speed 10278.05 samples/sec Loss 4.0222 LearningRate 0.0001 Epoch: 38 Global Step: 218740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:44,085-Speed 10446.60 samples/sec Loss 3.9997 LearningRate 0.0001 Epoch: 38 Global Step: 218750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:45,013-Speed 11046.90 samples/sec Loss 4.0306 LearningRate 0.0001 Epoch: 38 Global Step: 218760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:45,941-Speed 11037.79 samples/sec Loss 4.0623 LearningRate 0.0001 Epoch: 38 Global Step: 218770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:46,872-Speed 11018.29 samples/sec Loss 3.9735 LearningRate 0.0001 Epoch: 38 Global Step: 218780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:01:47,790-Speed 11159.87 samples/sec Loss 3.9090 LearningRate 0.0001 Epoch: 38 Global Step: 218790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:48,740-Speed 10785.48 samples/sec Loss 3.9832 LearningRate 0.0001 Epoch: 38 Global Step: 218800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:49,698-Speed 10690.52 samples/sec Loss 4.0120 LearningRate 0.0001 Epoch: 38 Global Step: 218810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:50,716-Speed 10065.81 samples/sec Loss 4.0780 LearningRate 0.0001 Epoch: 38 Global Step: 218820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:51,714-Speed 10274.68 samples/sec Loss 4.0993 LearningRate 0.0001 Epoch: 38 Global Step: 218830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:52,642-Speed 11031.90 samples/sec Loss 4.0948 LearningRate 0.0001 Epoch: 38 Global Step: 218840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:53,577-Speed 10965.21 samples/sec Loss 3.9969 LearningRate 0.0001 Epoch: 38 Global Step: 218850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:54,500-Speed 11107.38 samples/sec Loss 4.0096 LearningRate 0.0001 Epoch: 38 Global Step: 218860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:55,460-Speed 10665.72 samples/sec Loss 4.1006 LearningRate 0.0001 Epoch: 38 Global Step: 218870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:01:56,391-Speed 11009.26 samples/sec Loss 3.9746 LearningRate 0.0001 Epoch: 38 Global Step: 218880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:01:57,312-Speed 11122.61 samples/sec Loss 4.0103 LearningRate 0.0001 Epoch: 38 Global Step: 218890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:01:58,225-Speed 11231.99 samples/sec Loss 4.1017 LearningRate 0.0001 Epoch: 38 Global Step: 218900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:01:59,133-Speed 11275.87 samples/sec Loss 4.0019 LearningRate 0.0001 Epoch: 38 Global Step: 218910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:00,058-Speed 11082.20 samples/sec Loss 3.9898 LearningRate 0.0001 Epoch: 38 Global Step: 218920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:01,049-Speed 10348.95 samples/sec Loss 4.0473 LearningRate 0.0001 Epoch: 38 Global Step: 218930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:02,011-Speed 10650.01 samples/sec Loss 4.0889 LearningRate 0.0001 Epoch: 38 Global Step: 218940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:02,975-Speed 10631.32 samples/sec Loss 4.0348 LearningRate 0.0001 Epoch: 38 Global Step: 218950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:03,951-Speed 10496.10 samples/sec Loss 3.9665 LearningRate 0.0001 Epoch: 38 Global Step: 218960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:04,932-Speed 10444.57 samples/sec Loss 4.0258 LearningRate 0.0001 Epoch: 38 Global Step: 218970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:05,879-Speed 10824.32 samples/sec Loss 3.9773 LearningRate 0.0001 Epoch: 38 Global Step: 218980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:06,821-Speed 10883.14 samples/sec Loss 3.9864 LearningRate 0.0001 Epoch: 38 Global Step: 218990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:07,790-Speed 10580.28 samples/sec Loss 3.9996 LearningRate 0.0001 Epoch: 38 Global Step: 219000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:08,717-Speed 11052.41 samples/sec Loss 3.9405 LearningRate 0.0001 Epoch: 38 Global Step: 219010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:09,649-Speed 10993.75 samples/sec Loss 3.9453 LearningRate 0.0001 Epoch: 38 Global Step: 219020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:10,564-Speed 11200.98 samples/sec Loss 4.0175 LearningRate 0.0001 Epoch: 38 Global Step: 219030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:11,572-Speed 10159.15 samples/sec Loss 4.0755 LearningRate 0.0001 Epoch: 38 Global Step: 219040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:12,518-Speed 10839.29 samples/sec Loss 4.1199 LearningRate 0.0001 Epoch: 38 Global Step: 219050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:13,508-Speed 10350.84 samples/sec Loss 3.9714 LearningRate 0.0001 Epoch: 38 Global Step: 219060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:14,483-Speed 10509.48 samples/sec Loss 4.0407 LearningRate 0.0001 Epoch: 38 Global Step: 219070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:15,448-Speed 10613.36 samples/sec Loss 3.9813 LearningRate 0.0001 Epoch: 38 Global Step: 219080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:16,422-Speed 10519.83 samples/sec Loss 3.9680 LearningRate 0.0001 Epoch: 38 Global Step: 219090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:17,376-Speed 10745.49 samples/sec Loss 3.9000 LearningRate 0.0001 Epoch: 38 Global Step: 219100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:18,361-Speed 10397.39 samples/sec Loss 4.0041 LearningRate 0.0001 Epoch: 38 Global Step: 219110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:19,329-Speed 10588.68 samples/sec Loss 3.9691 LearningRate 0.0001 Epoch: 38 Global Step: 219120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:20,256-Speed 11053.31 samples/sec Loss 4.0111 LearningRate 0.0001 Epoch: 38 Global Step: 219130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:21,176-Speed 11143.65 samples/sec Loss 3.8607 LearningRate 0.0001 Epoch: 38 Global Step: 219140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:22,107-Speed 11003.07 samples/sec Loss 4.0138 LearningRate 0.0001 Epoch: 38 Global Step: 219150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:23,034-Speed 11059.54 samples/sec Loss 3.9923 LearningRate 0.0001 Epoch: 38 Global Step: 219160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:23,947-Speed 11219.37 samples/sec Loss 3.8934 LearningRate 0.0001 Epoch: 38 Global Step: 219170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:24,855-Speed 11285.44 samples/sec Loss 4.0503 LearningRate 0.0001 Epoch: 38 Global Step: 219180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:02:25,781-Speed 11065.03 samples/sec Loss 4.1585 LearningRate 0.0001 Epoch: 38 Global Step: 219190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:26,687-Speed 11319.99 samples/sec Loss 4.0138 LearningRate 0.0001 Epoch: 38 Global Step: 219200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:27,611-Speed 11083.69 samples/sec Loss 3.9250 LearningRate 0.0001 Epoch: 38 Global Step: 219210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:28,549-Speed 10931.50 samples/sec Loss 4.0156 LearningRate 0.0001 Epoch: 38 Global Step: 219220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:29,514-Speed 10615.24 samples/sec Loss 3.8903 LearningRate 0.0001 Epoch: 38 Global Step: 219230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:30,454-Speed 10894.25 samples/sec Loss 3.9440 LearningRate 0.0001 Epoch: 38 Global Step: 219240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:31,375-Speed 11131.05 samples/sec Loss 4.1558 LearningRate 0.0001 Epoch: 38 Global Step: 219250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:32,335-Speed 10736.61 samples/sec Loss 3.9408 LearningRate 0.0001 Epoch: 38 Global Step: 219260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:33,238-Speed 11346.66 samples/sec Loss 4.0576 LearningRate 0.0001 Epoch: 38 Global Step: 219270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:34,167-Speed 11035.17 samples/sec Loss 4.0464 LearningRate 0.0001 Epoch: 38 Global Step: 219280 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:35,092-Speed 11068.78 samples/sec Loss 3.9761 LearningRate 0.0001 Epoch: 38 Global Step: 219290 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:36,031-Speed 10919.00 samples/sec Loss 3.9772 LearningRate 0.0001 Epoch: 38 Global Step: 219300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:37,023-Speed 10342.01 samples/sec Loss 4.0702 LearningRate 0.0001 Epoch: 38 Global Step: 219310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:38,005-Speed 10432.35 samples/sec Loss 4.0199 LearningRate 0.0001 Epoch: 38 Global Step: 219320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:38,988-Speed 10415.95 samples/sec Loss 4.1364 LearningRate 0.0001 Epoch: 38 Global Step: 219330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:39,968-Speed 10460.28 samples/sec Loss 3.9598 LearningRate 0.0001 Epoch: 38 Global Step: 219340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:40,953-Speed 10397.64 samples/sec Loss 4.0213 LearningRate 0.0001 Epoch: 38 Global Step: 219350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:41,913-Speed 10682.46 samples/sec Loss 4.0087 LearningRate 0.0001 Epoch: 38 Global Step: 219360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:42,847-Speed 10966.60 samples/sec Loss 3.9419 LearningRate 0.0001 Epoch: 38 Global Step: 219370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:43,785-Speed 10928.29 samples/sec Loss 4.0717 LearningRate 0.0001 Epoch: 38 Global Step: 219380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:44,732-Speed 10813.32 samples/sec Loss 3.9385 LearningRate 0.0001 Epoch: 38 Global Step: 219390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:45,656-Speed 11090.60 samples/sec Loss 4.0065 LearningRate 0.0001 Epoch: 38 Global Step: 219400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:02:46,582-Speed 11068.62 samples/sec Loss 3.9623 LearningRate 0.0001 Epoch: 38 Global Step: 219410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:47,535-Speed 10755.65 samples/sec Loss 4.0169 LearningRate 0.0001 Epoch: 38 Global Step: 219420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:48,486-Speed 10782.37 samples/sec Loss 3.8998 LearningRate 0.0001 Epoch: 38 Global Step: 219430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:49,439-Speed 10749.99 samples/sec Loss 4.0024 LearningRate 0.0001 Epoch: 38 Global Step: 219440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:50,376-Speed 10938.26 samples/sec Loss 4.0278 LearningRate 0.0001 Epoch: 38 Global Step: 219450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:51,298-Speed 11114.80 samples/sec Loss 3.9966 LearningRate 0.0001 Epoch: 38 Global Step: 219460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:52,228-Speed 11025.54 samples/sec Loss 4.0376 LearningRate 0.0001 Epoch: 38 Global Step: 219470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:53,187-Speed 10679.99 samples/sec Loss 4.0350 LearningRate 0.0001 Epoch: 38 Global Step: 219480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:02:54,144-Speed 10710.18 samples/sec Loss 3.9611 LearningRate 0.0001 Epoch: 38 Global Step: 219490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:55,099-Speed 10729.32 samples/sec Loss 4.0305 LearningRate 0.0001 Epoch: 38 Global Step: 219500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:56,056-Speed 10707.51 samples/sec Loss 4.1567 LearningRate 0.0001 Epoch: 38 Global Step: 219510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:56,987-Speed 11005.75 samples/sec Loss 3.9453 LearningRate 0.0001 Epoch: 38 Global Step: 219520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:57,935-Speed 10812.71 samples/sec Loss 3.9788 LearningRate 0.0001 Epoch: 38 Global Step: 219530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:58,920-Speed 10399.36 samples/sec Loss 3.9620 LearningRate 0.0001 Epoch: 38 Global Step: 219540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:02:59,871-Speed 10768.95 samples/sec Loss 3.9928 LearningRate 0.0001 Epoch: 38 Global Step: 219550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:03:00,848-Speed 10497.20 samples/sec Loss 3.9254 LearningRate 0.0001 Epoch: 38 Global Step: 219560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:03:01,829-Speed 10437.39 samples/sec Loss 3.9066 LearningRate 0.0001 Epoch: 38 Global Step: 219570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:03:02,791-Speed 10657.69 samples/sec Loss 4.0570 LearningRate 0.0001 Epoch: 38 Global Step: 219580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:03:03,730-Speed 10910.23 samples/sec Loss 4.1052 LearningRate 0.0001 Epoch: 38 Global Step: 219590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:04,674-Speed 10849.73 samples/sec Loss 4.0334 LearningRate 0.0001 Epoch: 38 Global Step: 219600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:05,653-Speed 10476.00 samples/sec Loss 4.0893 LearningRate 0.0001 Epoch: 38 Global Step: 219610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:06,654-Speed 10231.68 samples/sec Loss 3.9828 LearningRate 0.0001 Epoch: 38 Global Step: 219620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:07,623-Speed 10583.74 samples/sec Loss 3.9305 LearningRate 0.0001 Epoch: 38 Global Step: 219630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:08,558-Speed 10952.41 samples/sec Loss 3.9905 LearningRate 0.0001 Epoch: 38 Global Step: 219640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:09,479-Speed 11125.07 samples/sec Loss 3.9928 LearningRate 0.0001 Epoch: 38 Global Step: 219650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:10,398-Speed 11152.67 samples/sec Loss 4.0869 LearningRate 0.0001 Epoch: 38 Global Step: 219660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:11,311-Speed 11222.10 samples/sec Loss 4.0037 LearningRate 0.0001 Epoch: 38 Global Step: 219670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:12,271-Speed 10676.75 samples/sec Loss 3.9881 LearningRate 0.0001 Epoch: 38 Global Step: 219680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:13,205-Speed 10967.26 samples/sec Loss 4.0011 LearningRate 0.0001 Epoch: 38 Global Step: 219690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:14,131-Speed 11074.73 samples/sec Loss 3.8978 LearningRate 0.0001 Epoch: 38 Global Step: 219700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:15,057-Speed 11058.37 samples/sec Loss 4.1306 LearningRate 0.0001 Epoch: 38 Global Step: 219710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:16,033-Speed 10501.80 samples/sec Loss 4.0112 LearningRate 0.0001 Epoch: 38 Global Step: 219720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:16,994-Speed 10663.11 samples/sec Loss 4.0033 LearningRate 0.0001 Epoch: 38 Global Step: 219730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:17,926-Speed 10998.34 samples/sec Loss 4.0087 LearningRate 0.0001 Epoch: 38 Global Step: 219740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:18,891-Speed 10612.06 samples/sec Loss 3.9681 LearningRate 0.0001 Epoch: 38 Global Step: 219750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:19,808-Speed 11175.16 samples/sec Loss 4.0183 LearningRate 0.0001 Epoch: 38 Global Step: 219760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:20,758-Speed 10792.97 samples/sec Loss 3.9398 LearningRate 0.0001 Epoch: 38 Global Step: 219770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:21,675-Speed 11175.12 samples/sec Loss 4.0561 LearningRate 0.0001 Epoch: 38 Global Step: 219780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:22,580-Speed 11332.53 samples/sec Loss 3.9500 LearningRate 0.0001 Epoch: 38 Global Step: 219790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:23,546-Speed 10598.28 samples/sec Loss 4.0024 LearningRate 0.0001 Epoch: 38 Global Step: 219800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:24,523-Speed 10492.54 samples/sec Loss 3.9904 LearningRate 0.0001 Epoch: 38 Global Step: 219810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:25,459-Speed 10945.02 samples/sec Loss 4.0225 LearningRate 0.0001 Epoch: 38 Global Step: 219820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:26,375-Speed 11188.37 samples/sec Loss 4.0999 LearningRate 0.0001 Epoch: 38 Global Step: 219830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:27,354-Speed 10470.90 samples/sec Loss 3.9780 LearningRate 0.0001 Epoch: 38 Global Step: 219840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:28,325-Speed 10554.00 samples/sec Loss 3.9428 LearningRate 0.0001 Epoch: 38 Global Step: 219850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:29,293-Speed 10587.89 samples/sec Loss 4.0085 LearningRate 0.0001 Epoch: 38 Global Step: 219860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:30,258-Speed 10615.99 samples/sec Loss 4.1169 LearningRate 0.0001 Epoch: 38 Global Step: 219870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:31,180-Speed 11113.70 samples/sec Loss 4.1115 LearningRate 0.0001 Epoch: 38 Global Step: 219880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:32,132-Speed 10767.73 samples/sec Loss 3.9633 LearningRate 0.0001 Epoch: 38 Global Step: 219890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:33,072-Speed 10898.61 samples/sec Loss 4.1651 LearningRate 0.0001 Epoch: 38 Global Step: 219900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:33,997-Speed 11080.84 samples/sec Loss 3.9168 LearningRate 0.0001 Epoch: 38 Global Step: 219910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:34,938-Speed 10892.14 samples/sec Loss 4.0472 LearningRate 0.0001 Epoch: 38 Global Step: 219920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:35,890-Speed 10763.23 samples/sec Loss 4.0567 LearningRate 0.0001 Epoch: 38 Global Step: 219930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:36,823-Speed 10975.07 samples/sec Loss 3.9936 LearningRate 0.0001 Epoch: 38 Global Step: 219940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:37,728-Speed 11329.98 samples/sec Loss 3.9062 LearningRate 0.0001 Epoch: 38 Global Step: 219950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:38,690-Speed 10650.58 samples/sec Loss 3.9222 LearningRate 0.0001 Epoch: 38 Global Step: 219960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:03:39,630-Speed 10896.94 samples/sec Loss 4.0410 LearningRate 0.0001 Epoch: 38 Global Step: 219970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:40,591-Speed 10657.38 samples/sec Loss 3.9665 LearningRate 0.0001 Epoch: 38 Global Step: 219980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:41,563-Speed 10543.09 samples/sec Loss 4.0106 LearningRate 0.0001 Epoch: 38 Global Step: 219990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:03:42,499-Speed 10951.93 samples/sec Loss 3.9766 LearningRate 0.0001 Epoch: 38 Global Step: 220000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:04:03,851-[lfw][220000]XNorm: 7.592760 Training: 2022-04-11 07:04:03,851-[lfw][220000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-11 07:04:03,852-[lfw][220000]Accuracy-Highest: 0.99750 Training: 2022-04-11 07:04:28,542-[cfp_fp][220000]XNorm: 6.506685 Training: 2022-04-11 07:04:28,543-[cfp_fp][220000]Accuracy-Flip: 0.95543+-0.01110 Training: 2022-04-11 07:04:28,543-[cfp_fp][220000]Accuracy-Highest: 0.95943 Training: 2022-04-11 07:04:49,727-[agedb_30][220000]XNorm: 7.443154 Training: 2022-04-11 07:04:49,728-[agedb_30][220000]Accuracy-Flip: 0.97067+-0.00739 Training: 2022-04-11 07:04:49,728-[agedb_30][220000]Accuracy-Highest: 0.97183 Training: 2022-04-11 07:04:50,687-Speed 150.17 samples/sec Loss 3.8943 LearningRate 0.0001 Epoch: 38 Global Step: 220010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:04:51,623-Speed 10953.96 samples/sec Loss 3.9244 LearningRate 0.0001 Epoch: 38 Global Step: 220020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:04:52,545-Speed 11110.24 samples/sec Loss 3.9818 LearningRate 0.0001 Epoch: 38 Global Step: 220030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:04:53,452-Speed 11297.40 samples/sec Loss 3.9799 LearningRate 0.0001 Epoch: 38 Global Step: 220040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:54,377-Speed 11082.42 samples/sec Loss 3.9815 LearningRate 0.0001 Epoch: 38 Global Step: 220050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:55,302-Speed 11071.45 samples/sec Loss 3.9544 LearningRate 0.0001 Epoch: 38 Global Step: 220060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:56,212-Speed 11263.46 samples/sec Loss 3.9543 LearningRate 0.0001 Epoch: 38 Global Step: 220070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:57,155-Speed 10868.29 samples/sec Loss 4.0717 LearningRate 0.0001 Epoch: 38 Global Step: 220080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:58,063-Speed 11280.21 samples/sec Loss 4.0646 LearningRate 0.0001 Epoch: 38 Global Step: 220090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:59,013-Speed 10789.75 samples/sec Loss 3.9732 LearningRate 0.0001 Epoch: 38 Global Step: 220100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:04:59,986-Speed 10530.27 samples/sec Loss 4.1343 LearningRate 0.0001 Epoch: 38 Global Step: 220110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:05:00,942-Speed 10720.77 samples/sec Loss 4.0221 LearningRate 0.0001 Epoch: 38 Global Step: 220120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:05:01,918-Speed 10494.89 samples/sec Loss 4.0528 LearningRate 0.0001 Epoch: 38 Global Step: 220130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:05:02,937-Speed 10055.21 samples/sec Loss 4.0193 LearningRate 0.0001 Epoch: 38 Global Step: 220140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:03,926-Speed 10360.69 samples/sec Loss 4.0308 LearningRate 0.0001 Epoch: 38 Global Step: 220150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:04,861-Speed 10959.38 samples/sec Loss 4.0118 LearningRate 0.0001 Epoch: 38 Global Step: 220160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:05,777-Speed 11195.58 samples/sec Loss 3.9624 LearningRate 0.0001 Epoch: 38 Global Step: 220170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:06,727-Speed 10779.33 samples/sec Loss 4.0272 LearningRate 0.0001 Epoch: 38 Global Step: 220180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:07,708-Speed 10449.21 samples/sec Loss 3.9689 LearningRate 0.0001 Epoch: 38 Global Step: 220190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:08,633-Speed 11071.14 samples/sec Loss 4.0925 LearningRate 0.0001 Epoch: 38 Global Step: 220200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:09,559-Speed 11077.86 samples/sec Loss 3.9727 LearningRate 0.0001 Epoch: 38 Global Step: 220210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:10,497-Speed 10927.09 samples/sec Loss 3.9999 LearningRate 0.0001 Epoch: 38 Global Step: 220220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:11,414-Speed 11168.44 samples/sec Loss 3.9271 LearningRate 0.0001 Epoch: 38 Global Step: 220230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:12,351-Speed 10934.74 samples/sec Loss 4.0779 LearningRate 0.0001 Epoch: 38 Global Step: 220240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:13,308-Speed 10713.43 samples/sec Loss 3.9999 LearningRate 0.0001 Epoch: 38 Global Step: 220250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:14,226-Speed 11164.66 samples/sec Loss 3.9262 LearningRate 0.0001 Epoch: 38 Global Step: 220260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:15,152-Speed 11059.62 samples/sec Loss 4.0418 LearningRate 0.0001 Epoch: 38 Global Step: 220270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:16,088-Speed 10948.19 samples/sec Loss 4.0825 LearningRate 0.0001 Epoch: 38 Global Step: 220280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:17,018-Speed 11016.75 samples/sec Loss 4.0720 LearningRate 0.0001 Epoch: 38 Global Step: 220290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:17,948-Speed 11013.84 samples/sec Loss 3.9962 LearningRate 0.0001 Epoch: 38 Global Step: 220300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:18,900-Speed 10772.73 samples/sec Loss 3.9067 LearningRate 0.0001 Epoch: 38 Global Step: 220310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:19,827-Speed 11047.01 samples/sec Loss 3.9151 LearningRate 0.0001 Epoch: 38 Global Step: 220320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:20,752-Speed 11081.13 samples/sec Loss 4.1817 LearningRate 0.0001 Epoch: 38 Global Step: 220330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:21,701-Speed 10797.00 samples/sec Loss 3.9783 LearningRate 0.0001 Epoch: 38 Global Step: 220340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:22,632-Speed 11010.47 samples/sec Loss 3.9981 LearningRate 0.0001 Epoch: 38 Global Step: 220350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:23,588-Speed 10709.97 samples/sec Loss 3.9590 LearningRate 0.0001 Epoch: 38 Global Step: 220360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:24,562-Speed 10521.65 samples/sec Loss 3.9443 LearningRate 0.0001 Epoch: 38 Global Step: 220370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:25,558-Speed 10293.91 samples/sec Loss 3.9848 LearningRate 0.0001 Epoch: 38 Global Step: 220380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:26,509-Speed 10774.56 samples/sec Loss 4.0894 LearningRate 0.0001 Epoch: 38 Global Step: 220390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:27,476-Speed 10591.67 samples/sec Loss 3.9579 LearningRate 0.0001 Epoch: 38 Global Step: 220400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:28,442-Speed 10602.21 samples/sec Loss 4.0797 LearningRate 0.0001 Epoch: 38 Global Step: 220410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:29,365-Speed 11107.85 samples/sec Loss 4.0495 LearningRate 0.0001 Epoch: 38 Global Step: 220420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:30,362-Speed 10283.74 samples/sec Loss 3.9053 LearningRate 0.0001 Epoch: 38 Global Step: 220430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:31,303-Speed 10890.43 samples/sec Loss 3.9236 LearningRate 0.0001 Epoch: 38 Global Step: 220440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:32,263-Speed 10664.99 samples/sec Loss 3.9477 LearningRate 0.0001 Epoch: 38 Global Step: 220450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:33,180-Speed 11184.34 samples/sec Loss 3.9741 LearningRate 0.0001 Epoch: 38 Global Step: 220460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:34,079-Speed 11401.12 samples/sec Loss 3.9838 LearningRate 0.0001 Epoch: 38 Global Step: 220470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:35,016-Speed 10927.51 samples/sec Loss 4.0378 LearningRate 0.0001 Epoch: 38 Global Step: 220480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:35,990-Speed 10528.07 samples/sec Loss 4.0834 LearningRate 0.0001 Epoch: 38 Global Step: 220490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:36,945-Speed 10733.55 samples/sec Loss 3.9870 LearningRate 0.0001 Epoch: 38 Global Step: 220500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:37,895-Speed 10781.73 samples/sec Loss 4.0701 LearningRate 0.0001 Epoch: 38 Global Step: 220510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:38,905-Speed 10148.27 samples/sec Loss 3.9566 LearningRate 0.0001 Epoch: 38 Global Step: 220520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:39,879-Speed 10521.88 samples/sec Loss 3.8758 LearningRate 0.0001 Epoch: 38 Global Step: 220530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:40,826-Speed 10817.67 samples/sec Loss 4.0403 LearningRate 0.0001 Epoch: 38 Global Step: 220540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:41,777-Speed 10771.63 samples/sec Loss 3.9323 LearningRate 0.0001 Epoch: 38 Global Step: 220550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:42,713-Speed 10954.92 samples/sec Loss 4.0076 LearningRate 0.0001 Epoch: 38 Global Step: 220560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:43,678-Speed 10616.62 samples/sec Loss 4.0520 LearningRate 0.0001 Epoch: 38 Global Step: 220570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:44,645-Speed 10594.95 samples/sec Loss 4.0466 LearningRate 0.0001 Epoch: 38 Global Step: 220580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:45,561-Speed 11193.00 samples/sec Loss 4.0016 LearningRate 0.0001 Epoch: 38 Global Step: 220590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:46,494-Speed 10978.23 samples/sec Loss 4.0161 LearningRate 0.0001 Epoch: 38 Global Step: 220600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:47,428-Speed 10971.49 samples/sec Loss 4.0920 LearningRate 0.0001 Epoch: 38 Global Step: 220610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:48,350-Speed 11118.59 samples/sec Loss 4.0542 LearningRate 0.0001 Epoch: 38 Global Step: 220620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:49,283-Speed 10978.93 samples/sec Loss 3.9316 LearningRate 0.0001 Epoch: 38 Global Step: 220630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:50,247-Speed 10634.34 samples/sec Loss 3.9643 LearningRate 0.0001 Epoch: 38 Global Step: 220640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:51,244-Speed 10277.58 samples/sec Loss 3.9553 LearningRate 0.0001 Epoch: 38 Global Step: 220650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:52,224-Speed 10455.40 samples/sec Loss 4.0473 LearningRate 0.0001 Epoch: 38 Global Step: 220660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:53,181-Speed 10699.45 samples/sec Loss 3.9166 LearningRate 0.0001 Epoch: 38 Global Step: 220670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:54,102-Speed 11125.98 samples/sec Loss 4.0759 LearningRate 0.0001 Epoch: 38 Global Step: 220680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:55,066-Speed 10632.89 samples/sec Loss 4.0637 LearningRate 0.0001 Epoch: 38 Global Step: 220690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:55,986-Speed 11137.28 samples/sec Loss 4.0731 LearningRate 0.0001 Epoch: 38 Global Step: 220700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:56,922-Speed 10948.63 samples/sec Loss 4.0324 LearningRate 0.0001 Epoch: 38 Global Step: 220710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:57,827-Speed 11320.95 samples/sec Loss 3.9362 LearningRate 0.0001 Epoch: 38 Global Step: 220720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:05:58,755-Speed 11048.13 samples/sec Loss 4.0025 LearningRate 0.0001 Epoch: 38 Global Step: 220730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:05:59,747-Speed 10326.18 samples/sec Loss 4.0144 LearningRate 0.0001 Epoch: 38 Global Step: 220740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:00,713-Speed 10608.37 samples/sec Loss 3.8647 LearningRate 0.0001 Epoch: 38 Global Step: 220750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:01,671-Speed 10697.91 samples/sec Loss 3.9786 LearningRate 0.0001 Epoch: 38 Global Step: 220760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:02,633-Speed 10649.09 samples/sec Loss 3.9726 LearningRate 0.0001 Epoch: 38 Global Step: 220770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:03,595-Speed 10655.41 samples/sec Loss 3.9221 LearningRate 0.0001 Epoch: 38 Global Step: 220780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:04,556-Speed 10659.96 samples/sec Loss 4.1308 LearningRate 0.0001 Epoch: 38 Global Step: 220790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:05,515-Speed 10684.21 samples/sec Loss 3.9827 LearningRate 0.0001 Epoch: 38 Global Step: 220800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:06,503-Speed 10377.42 samples/sec Loss 4.0123 LearningRate 0.0001 Epoch: 38 Global Step: 220810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:07,505-Speed 10222.58 samples/sec Loss 4.0501 LearningRate 0.0001 Epoch: 38 Global Step: 220820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:08,427-Speed 11110.16 samples/sec Loss 3.8952 LearningRate 0.0001 Epoch: 38 Global Step: 220830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:09,361-Speed 10973.43 samples/sec Loss 4.0682 LearningRate 0.0001 Epoch: 38 Global Step: 220840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:10,282-Speed 11128.74 samples/sec Loss 4.0228 LearningRate 0.0001 Epoch: 38 Global Step: 220850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:11,214-Speed 10990.23 samples/sec Loss 3.9612 LearningRate 0.0001 Epoch: 38 Global Step: 220860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:12,180-Speed 10615.26 samples/sec Loss 3.9471 LearningRate 0.0001 Epoch: 38 Global Step: 220870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:13,143-Speed 10634.96 samples/sec Loss 3.9871 LearningRate 0.0001 Epoch: 38 Global Step: 220880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:14,162-Speed 10056.69 samples/sec Loss 3.9239 LearningRate 0.0001 Epoch: 38 Global Step: 220890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:15,123-Speed 10667.64 samples/sec Loss 4.0034 LearningRate 0.0001 Epoch: 38 Global Step: 220900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:16,057-Speed 10964.22 samples/sec Loss 4.0497 LearningRate 0.0001 Epoch: 38 Global Step: 220910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:17,073-Speed 10089.66 samples/sec Loss 3.9821 LearningRate 0.0001 Epoch: 38 Global Step: 220920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:18,065-Speed 10328.20 samples/sec Loss 3.9868 LearningRate 0.0001 Epoch: 38 Global Step: 220930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:19,033-Speed 10584.11 samples/sec Loss 4.0983 LearningRate 0.0001 Epoch: 38 Global Step: 220940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:20,008-Speed 10513.43 samples/sec Loss 3.9576 LearningRate 0.0001 Epoch: 38 Global Step: 220950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:20,946-Speed 10925.22 samples/sec Loss 4.0404 LearningRate 0.0001 Epoch: 38 Global Step: 220960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:21,853-Speed 11301.92 samples/sec Loss 4.1180 LearningRate 0.0001 Epoch: 38 Global Step: 220970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:22,772-Speed 11143.87 samples/sec Loss 4.0209 LearningRate 0.0001 Epoch: 38 Global Step: 220980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:23,717-Speed 10840.16 samples/sec Loss 4.0484 LearningRate 0.0001 Epoch: 38 Global Step: 220990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:24,642-Speed 11081.43 samples/sec Loss 4.0032 LearningRate 0.0001 Epoch: 38 Global Step: 221000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:25,615-Speed 10533.05 samples/sec Loss 3.9877 LearningRate 0.0001 Epoch: 38 Global Step: 221010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:26,550-Speed 10953.92 samples/sec Loss 4.0177 LearningRate 0.0001 Epoch: 38 Global Step: 221020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:27,479-Speed 11036.96 samples/sec Loss 4.0296 LearningRate 0.0001 Epoch: 38 Global Step: 221030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:28,426-Speed 10817.03 samples/sec Loss 4.0413 LearningRate 0.0001 Epoch: 38 Global Step: 221040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:29,386-Speed 10680.42 samples/sec Loss 4.0137 LearningRate 0.0001 Epoch: 38 Global Step: 221050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:30,323-Speed 10931.70 samples/sec Loss 3.9960 LearningRate 0.0001 Epoch: 38 Global Step: 221060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:31,302-Speed 10462.60 samples/sec Loss 3.9694 LearningRate 0.0001 Epoch: 38 Global Step: 221070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:32,259-Speed 10709.94 samples/sec Loss 4.0369 LearningRate 0.0001 Epoch: 38 Global Step: 221080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:33,188-Speed 11036.07 samples/sec Loss 3.9252 LearningRate 0.0001 Epoch: 38 Global Step: 221090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:34,110-Speed 11111.89 samples/sec Loss 4.0297 LearningRate 0.0001 Epoch: 38 Global Step: 221100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:35,028-Speed 11168.32 samples/sec Loss 3.9598 LearningRate 0.0001 Epoch: 38 Global Step: 221110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:35,995-Speed 10591.36 samples/sec Loss 3.9624 LearningRate 0.0001 Epoch: 38 Global Step: 221120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:36,972-Speed 10483.93 samples/sec Loss 3.8306 LearningRate 0.0001 Epoch: 38 Global Step: 221130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:37,949-Speed 10493.88 samples/sec Loss 3.9979 LearningRate 0.0001 Epoch: 38 Global Step: 221140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:38,894-Speed 10838.15 samples/sec Loss 3.9493 LearningRate 0.0001 Epoch: 38 Global Step: 221150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:39,888-Speed 10317.46 samples/sec Loss 4.0235 LearningRate 0.0001 Epoch: 38 Global Step: 221160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:40,829-Speed 10888.21 samples/sec Loss 3.9753 LearningRate 0.0001 Epoch: 38 Global Step: 221170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:41,824-Speed 10301.78 samples/sec Loss 4.1885 LearningRate 0.0001 Epoch: 38 Global Step: 221180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:06:42,773-Speed 10797.49 samples/sec Loss 3.9336 LearningRate 0.0001 Epoch: 38 Global Step: 221190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:43,735-Speed 10656.05 samples/sec Loss 4.0056 LearningRate 0.0001 Epoch: 38 Global Step: 221200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:44,669-Speed 10965.54 samples/sec Loss 4.1178 LearningRate 0.0001 Epoch: 38 Global Step: 221210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:45,575-Speed 11312.58 samples/sec Loss 3.9681 LearningRate 0.0001 Epoch: 38 Global Step: 221220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:46,517-Speed 10878.47 samples/sec Loss 4.0437 LearningRate 0.0001 Epoch: 38 Global Step: 221230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:47,444-Speed 11049.35 samples/sec Loss 4.0471 LearningRate 0.0001 Epoch: 38 Global Step: 221240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:48,384-Speed 10907.78 samples/sec Loss 3.9432 LearningRate 0.0001 Epoch: 38 Global Step: 221250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:49,309-Speed 11080.37 samples/sec Loss 4.0394 LearningRate 0.0001 Epoch: 38 Global Step: 221260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:50,289-Speed 10456.22 samples/sec Loss 4.0575 LearningRate 0.0001 Epoch: 38 Global Step: 221270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:51,253-Speed 10620.81 samples/sec Loss 4.0642 LearningRate 0.0001 Epoch: 38 Global Step: 221280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:52,192-Speed 10913.60 samples/sec Loss 3.9914 LearningRate 0.0001 Epoch: 38 Global Step: 221290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:53,137-Speed 10843.21 samples/sec Loss 3.8859 LearningRate 0.0001 Epoch: 38 Global Step: 221300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:54,121-Speed 10413.38 samples/sec Loss 4.0252 LearningRate 0.0001 Epoch: 38 Global Step: 221310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:55,060-Speed 10918.32 samples/sec Loss 3.9322 LearningRate 0.0001 Epoch: 38 Global Step: 221320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:56,046-Speed 10387.41 samples/sec Loss 3.9312 LearningRate 0.0001 Epoch: 38 Global Step: 221330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:06:56,989-Speed 10873.29 samples/sec Loss 3.9658 LearningRate 0.0001 Epoch: 38 Global Step: 221340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:57,946-Speed 10702.25 samples/sec Loss 4.0827 LearningRate 0.0001 Epoch: 38 Global Step: 221350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:58,879-Speed 10977.70 samples/sec Loss 4.0013 LearningRate 0.0001 Epoch: 38 Global Step: 221360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:06:59,841-Speed 10657.67 samples/sec Loss 3.9310 LearningRate 0.0001 Epoch: 38 Global Step: 221370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:00,786-Speed 10847.35 samples/sec Loss 3.8522 LearningRate 0.0001 Epoch: 38 Global Step: 221380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:01,770-Speed 10405.14 samples/sec Loss 4.0006 LearningRate 0.0001 Epoch: 38 Global Step: 221390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:02,781-Speed 10135.19 samples/sec Loss 4.0191 LearningRate 0.0001 Epoch: 38 Global Step: 221400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:03,724-Speed 10874.76 samples/sec Loss 4.0576 LearningRate 0.0001 Epoch: 38 Global Step: 221410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:04,677-Speed 10750.87 samples/sec Loss 3.8990 LearningRate 0.0001 Epoch: 38 Global Step: 221420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:05,641-Speed 10636.70 samples/sec Loss 3.9220 LearningRate 0.0001 Epoch: 38 Global Step: 221430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:06,586-Speed 10837.22 samples/sec Loss 4.0645 LearningRate 0.0001 Epoch: 38 Global Step: 221440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:07,523-Speed 10935.98 samples/sec Loss 3.9148 LearningRate 0.0001 Epoch: 38 Global Step: 221450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:08,455-Speed 10993.13 samples/sec Loss 3.9174 LearningRate 0.0001 Epoch: 38 Global Step: 221460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:09,377-Speed 11113.28 samples/sec Loss 4.0805 LearningRate 0.0001 Epoch: 38 Global Step: 221470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:10,287-Speed 11265.12 samples/sec Loss 3.9807 LearningRate 0.0001 Epoch: 38 Global Step: 221480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:11,221-Speed 10967.56 samples/sec Loss 3.9233 LearningRate 0.0001 Epoch: 38 Global Step: 221490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:12,187-Speed 10607.38 samples/sec Loss 4.0172 LearningRate 0.0001 Epoch: 38 Global Step: 221500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:13,136-Speed 10800.43 samples/sec Loss 3.9450 LearningRate 0.0001 Epoch: 38 Global Step: 221510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:14,071-Speed 10955.92 samples/sec Loss 3.9679 LearningRate 0.0001 Epoch: 38 Global Step: 221520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:15,010-Speed 10923.34 samples/sec Loss 3.9512 LearningRate 0.0001 Epoch: 38 Global Step: 221530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:15,936-Speed 11056.57 samples/sec Loss 4.0482 LearningRate 0.0001 Epoch: 38 Global Step: 221540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:16,901-Speed 10618.76 samples/sec Loss 3.9209 LearningRate 0.0001 Epoch: 38 Global Step: 221550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:17,853-Speed 10765.98 samples/sec Loss 3.8170 LearningRate 0.0001 Epoch: 38 Global Step: 221560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:18,827-Speed 10517.12 samples/sec Loss 4.0669 LearningRate 0.0001 Epoch: 38 Global Step: 221570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:19,757-Speed 11022.45 samples/sec Loss 4.0872 LearningRate 0.0001 Epoch: 38 Global Step: 221580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:20,686-Speed 11031.99 samples/sec Loss 4.0237 LearningRate 0.0001 Epoch: 38 Global Step: 221590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:21,600-Speed 11215.04 samples/sec Loss 4.0249 LearningRate 0.0001 Epoch: 38 Global Step: 221600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:22,533-Speed 10978.59 samples/sec Loss 4.0464 LearningRate 0.0001 Epoch: 38 Global Step: 221610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:23,470-Speed 10939.79 samples/sec Loss 3.9903 LearningRate 0.0001 Epoch: 38 Global Step: 221620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:07:24,427-Speed 10709.13 samples/sec Loss 3.9920 LearningRate 0.0001 Epoch: 38 Global Step: 221630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:25,404-Speed 10481.79 samples/sec Loss 3.8418 LearningRate 0.0001 Epoch: 38 Global Step: 221640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:26,430-Speed 9983.89 samples/sec Loss 4.0317 LearningRate 0.0001 Epoch: 38 Global Step: 221650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:27,450-Speed 10049.94 samples/sec Loss 4.1340 LearningRate 0.0001 Epoch: 38 Global Step: 221660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:28,402-Speed 10762.51 samples/sec Loss 3.9870 LearningRate 0.0001 Epoch: 38 Global Step: 221670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:29,415-Speed 10118.56 samples/sec Loss 4.0353 LearningRate 0.0001 Epoch: 38 Global Step: 221680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:30,371-Speed 10728.93 samples/sec Loss 4.0239 LearningRate 0.0001 Epoch: 38 Global Step: 221690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:31,349-Speed 10474.69 samples/sec Loss 4.1136 LearningRate 0.0001 Epoch: 38 Global Step: 221700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:32,298-Speed 10797.02 samples/sec Loss 3.9202 LearningRate 0.0001 Epoch: 38 Global Step: 221710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:33,216-Speed 11162.50 samples/sec Loss 3.9381 LearningRate 0.0001 Epoch: 38 Global Step: 221720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:34,141-Speed 11070.96 samples/sec Loss 3.9825 LearningRate 0.0001 Epoch: 38 Global Step: 221730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:35,040-Speed 11402.08 samples/sec Loss 4.0720 LearningRate 0.0001 Epoch: 38 Global Step: 221740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:36,014-Speed 10519.66 samples/sec Loss 4.0934 LearningRate 0.0001 Epoch: 38 Global Step: 221750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:46,596-Speed 967.83 samples/sec Loss 4.0275 LearningRate 0.0001 Epoch: 39 Global Step: 221760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:47,534-Speed 10921.36 samples/sec Loss 3.9631 LearningRate 0.0001 Epoch: 39 Global Step: 221770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:48,610-Speed 9524.25 samples/sec Loss 4.0591 LearningRate 0.0001 Epoch: 39 Global Step: 221780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:49,619-Speed 10157.48 samples/sec Loss 3.9320 LearningRate 0.0001 Epoch: 39 Global Step: 221790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:50,620-Speed 10241.73 samples/sec Loss 4.0990 LearningRate 0.0001 Epoch: 39 Global Step: 221800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:51,666-Speed 9792.97 samples/sec Loss 3.9634 LearningRate 0.0001 Epoch: 39 Global Step: 221810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:52,747-Speed 9476.26 samples/sec Loss 3.9207 LearningRate 0.0001 Epoch: 39 Global Step: 221820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:53,707-Speed 10679.01 samples/sec Loss 3.9190 LearningRate 0.0001 Epoch: 39 Global Step: 221830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:54,688-Speed 10450.66 samples/sec Loss 4.0205 LearningRate 0.0001 Epoch: 39 Global Step: 221840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:55,692-Speed 10208.19 samples/sec Loss 4.0005 LearningRate 0.0001 Epoch: 39 Global Step: 221850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:56,634-Speed 10881.37 samples/sec Loss 3.9140 LearningRate 0.0001 Epoch: 39 Global Step: 221860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:07:57,564-Speed 11014.00 samples/sec Loss 4.0764 LearningRate 0.0001 Epoch: 39 Global Step: 221870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:58,488-Speed 11094.13 samples/sec Loss 3.9259 LearningRate 0.0001 Epoch: 39 Global Step: 221880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:07:59,434-Speed 10826.92 samples/sec Loss 3.9344 LearningRate 0.0001 Epoch: 39 Global Step: 221890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:08:00,344-Speed 11260.57 samples/sec Loss 3.9876 LearningRate 0.0001 Epoch: 39 Global Step: 221900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:01,304-Speed 10680.57 samples/sec Loss 3.9422 LearningRate 0.0001 Epoch: 39 Global Step: 221910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:02,316-Speed 10119.92 samples/sec Loss 4.0415 LearningRate 0.0001 Epoch: 39 Global Step: 221920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:03,269-Speed 10759.70 samples/sec Loss 3.9487 LearningRate 0.0001 Epoch: 39 Global Step: 221930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:04,243-Speed 10525.97 samples/sec Loss 4.0032 LearningRate 0.0001 Epoch: 39 Global Step: 221940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:05,214-Speed 10554.39 samples/sec Loss 3.9471 LearningRate 0.0001 Epoch: 39 Global Step: 221950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:06,176-Speed 10649.78 samples/sec Loss 4.0864 LearningRate 0.0001 Epoch: 39 Global Step: 221960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:07,130-Speed 10741.99 samples/sec Loss 3.9949 LearningRate 0.0001 Epoch: 39 Global Step: 221970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:08,102-Speed 10537.07 samples/sec Loss 4.0032 LearningRate 0.0001 Epoch: 39 Global Step: 221980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:09,026-Speed 11089.22 samples/sec Loss 3.9825 LearningRate 0.0001 Epoch: 39 Global Step: 221990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:08:09,950-Speed 11095.03 samples/sec Loss 3.9841 LearningRate 0.0001 Epoch: 39 Global Step: 222000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:08:31,523-[lfw][222000]XNorm: 7.582195 Training: 2022-04-11 07:08:31,524-[lfw][222000]Accuracy-Flip: 0.99700+-0.00277 Training: 2022-04-11 07:08:31,524-[lfw][222000]Accuracy-Highest: 0.99750 Training: 2022-04-11 07:08:56,115-[cfp_fp][222000]XNorm: 6.492860 Training: 2022-04-11 07:08:56,116-[cfp_fp][222000]Accuracy-Flip: 0.95757+-0.01038 Training: 2022-04-11 07:08:56,116-[cfp_fp][222000]Accuracy-Highest: 0.95943 Training: 2022-04-11 07:09:17,404-[agedb_30][222000]XNorm: 7.433375 Training: 2022-04-11 07:09:17,405-[agedb_30][222000]Accuracy-Flip: 0.97100+-0.00727 Training: 2022-04-11 07:09:17,405-[agedb_30][222000]Accuracy-Highest: 0.97183 Training: 2022-04-11 07:09:18,384-Speed 149.64 samples/sec Loss 3.8374 LearningRate 0.0001 Epoch: 39 Global Step: 222010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:19,357-Speed 10531.15 samples/sec Loss 3.9335 LearningRate 0.0001 Epoch: 39 Global Step: 222020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:20,299-Speed 10876.47 samples/sec Loss 3.8915 LearningRate 0.0001 Epoch: 39 Global Step: 222030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:21,227-Speed 11039.82 samples/sec Loss 3.8895 LearningRate 0.0001 Epoch: 39 Global Step: 222040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:22,148-Speed 11127.49 samples/sec Loss 4.0181 LearningRate 0.0001 Epoch: 39 Global Step: 222050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:23,083-Speed 10957.68 samples/sec Loss 3.9196 LearningRate 0.0001 Epoch: 39 Global Step: 222060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:24,033-Speed 10793.20 samples/sec Loss 3.9959 LearningRate 0.0001 Epoch: 39 Global Step: 222070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:25,087-Speed 9715.57 samples/sec Loss 3.8864 LearningRate 0.0001 Epoch: 39 Global Step: 222080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:26,036-Speed 10794.98 samples/sec Loss 4.0575 LearningRate 0.0001 Epoch: 39 Global Step: 222090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:27,096-Speed 9674.08 samples/sec Loss 3.9802 LearningRate 0.0001 Epoch: 39 Global Step: 222100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:28,066-Speed 10559.23 samples/sec Loss 3.8657 LearningRate 0.0001 Epoch: 39 Global Step: 222110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:09:29,045-Speed 10471.99 samples/sec Loss 3.8785 LearningRate 0.0001 Epoch: 39 Global Step: 222120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:29,979-Speed 10967.98 samples/sec Loss 3.9242 LearningRate 0.0001 Epoch: 39 Global Step: 222130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:31,074-Speed 9359.34 samples/sec Loss 3.8363 LearningRate 0.0001 Epoch: 39 Global Step: 222140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:32,046-Speed 10543.48 samples/sec Loss 3.9267 LearningRate 0.0001 Epoch: 39 Global Step: 222150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:33,056-Speed 10143.50 samples/sec Loss 3.9340 LearningRate 0.0001 Epoch: 39 Global Step: 222160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:33,979-Speed 11093.56 samples/sec Loss 3.9856 LearningRate 0.0001 Epoch: 39 Global Step: 222170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:34,950-Speed 10558.95 samples/sec Loss 3.9880 LearningRate 0.0001 Epoch: 39 Global Step: 222180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:35,909-Speed 10685.43 samples/sec Loss 3.8999 LearningRate 0.0001 Epoch: 39 Global Step: 222190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:36,977-Speed 9591.60 samples/sec Loss 3.9023 LearningRate 0.0001 Epoch: 39 Global Step: 222200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:37,949-Speed 10541.00 samples/sec Loss 3.8410 LearningRate 0.0001 Epoch: 39 Global Step: 222210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:38,929-Speed 10457.65 samples/sec Loss 3.9921 LearningRate 0.0001 Epoch: 39 Global Step: 222220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:39,855-Speed 11072.52 samples/sec Loss 4.1033 LearningRate 0.0001 Epoch: 39 Global Step: 222230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:40,793-Speed 10919.00 samples/sec Loss 3.9621 LearningRate 0.0001 Epoch: 39 Global Step: 222240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:41,757-Speed 10630.71 samples/sec Loss 3.9439 LearningRate 0.0001 Epoch: 39 Global Step: 222250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:42,766-Speed 10153.94 samples/sec Loss 4.0256 LearningRate 0.0001 Epoch: 39 Global Step: 222260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:43,739-Speed 10532.24 samples/sec Loss 3.9170 LearningRate 0.0001 Epoch: 39 Global Step: 222270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:44,694-Speed 10726.08 samples/sec Loss 4.0100 LearningRate 0.0001 Epoch: 39 Global Step: 222280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:45,605-Speed 11254.31 samples/sec Loss 4.0515 LearningRate 0.0001 Epoch: 39 Global Step: 222290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:46,529-Speed 11087.90 samples/sec Loss 4.0228 LearningRate 0.0001 Epoch: 39 Global Step: 222300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:47,469-Speed 10898.01 samples/sec Loss 3.9988 LearningRate 0.0001 Epoch: 39 Global Step: 222310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:48,431-Speed 10661.17 samples/sec Loss 4.0202 LearningRate 0.0001 Epoch: 39 Global Step: 222320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:49,392-Speed 10660.88 samples/sec Loss 3.9292 LearningRate 0.0001 Epoch: 39 Global Step: 222330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:50,357-Speed 10618.74 samples/sec Loss 3.9665 LearningRate 0.0001 Epoch: 39 Global Step: 222340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:51,321-Speed 10622.71 samples/sec Loss 3.9689 LearningRate 0.0001 Epoch: 39 Global Step: 222350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:09:52,264-Speed 10874.45 samples/sec Loss 3.9887 LearningRate 0.0000 Epoch: 39 Global Step: 222360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:53,218-Speed 10733.61 samples/sec Loss 4.0210 LearningRate 0.0000 Epoch: 39 Global Step: 222370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:09:54,136-Speed 11164.28 samples/sec Loss 3.9049 LearningRate 0.0000 Epoch: 39 Global Step: 222380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:55,111-Speed 10509.26 samples/sec Loss 3.8189 LearningRate 0.0000 Epoch: 39 Global Step: 222390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:56,080-Speed 10578.30 samples/sec Loss 3.9882 LearningRate 0.0000 Epoch: 39 Global Step: 222400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:56,999-Speed 11148.94 samples/sec Loss 3.9779 LearningRate 0.0000 Epoch: 39 Global Step: 222410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:57,930-Speed 11001.97 samples/sec Loss 3.9635 LearningRate 0.0000 Epoch: 39 Global Step: 222420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:58,854-Speed 11091.84 samples/sec Loss 3.9984 LearningRate 0.0000 Epoch: 39 Global Step: 222430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:09:59,800-Speed 10832.26 samples/sec Loss 4.0381 LearningRate 0.0000 Epoch: 39 Global Step: 222440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:00,761-Speed 10660.57 samples/sec Loss 3.9740 LearningRate 0.0000 Epoch: 39 Global Step: 222450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:01,693-Speed 11003.07 samples/sec Loss 4.0476 LearningRate 0.0000 Epoch: 39 Global Step: 222460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:02,648-Speed 10725.81 samples/sec Loss 3.9375 LearningRate 0.0000 Epoch: 39 Global Step: 222470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:03,615-Speed 10604.83 samples/sec Loss 4.0175 LearningRate 0.0000 Epoch: 39 Global Step: 222480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:04,598-Speed 10417.32 samples/sec Loss 3.9209 LearningRate 0.0000 Epoch: 39 Global Step: 222490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:05,551-Speed 10752.65 samples/sec Loss 3.9231 LearningRate 0.0000 Epoch: 39 Global Step: 222500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:06,481-Speed 11019.62 samples/sec Loss 4.0773 LearningRate 0.0000 Epoch: 39 Global Step: 222510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:07,414-Speed 10986.38 samples/sec Loss 3.9148 LearningRate 0.0000 Epoch: 39 Global Step: 222520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:08,390-Speed 10499.16 samples/sec Loss 3.9851 LearningRate 0.0000 Epoch: 39 Global Step: 222530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:09,324-Speed 10967.54 samples/sec Loss 3.9801 LearningRate 0.0000 Epoch: 39 Global Step: 222540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:10,255-Speed 11005.59 samples/sec Loss 3.9877 LearningRate 0.0000 Epoch: 39 Global Step: 222550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:11,167-Speed 11232.51 samples/sec Loss 4.1186 LearningRate 0.0000 Epoch: 39 Global Step: 222560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:12,114-Speed 10831.12 samples/sec Loss 3.8954 LearningRate 0.0000 Epoch: 39 Global Step: 222570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:13,075-Speed 10659.92 samples/sec Loss 3.8930 LearningRate 0.0000 Epoch: 39 Global Step: 222580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:10:14,056-Speed 10442.55 samples/sec Loss 3.8821 LearningRate 0.0000 Epoch: 39 Global Step: 222590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:14,983-Speed 11058.25 samples/sec Loss 4.0062 LearningRate 0.0000 Epoch: 39 Global Step: 222600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:15,982-Speed 10254.19 samples/sec Loss 3.9375 LearningRate 0.0000 Epoch: 39 Global Step: 222610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:16,991-Speed 10158.15 samples/sec Loss 3.9364 LearningRate 0.0000 Epoch: 39 Global Step: 222620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:17,990-Speed 10252.89 samples/sec Loss 3.9409 LearningRate 0.0000 Epoch: 39 Global Step: 222630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:19,006-Speed 10090.59 samples/sec Loss 3.9127 LearningRate 0.0000 Epoch: 39 Global Step: 222640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:20,014-Speed 10159.66 samples/sec Loss 4.0183 LearningRate 0.0000 Epoch: 39 Global Step: 222650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:20,947-Speed 10987.33 samples/sec Loss 3.9784 LearningRate 0.0000 Epoch: 39 Global Step: 222660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:21,881-Speed 10969.01 samples/sec Loss 4.0311 LearningRate 0.0000 Epoch: 39 Global Step: 222670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:22,795-Speed 11218.87 samples/sec Loss 3.9295 LearningRate 0.0000 Epoch: 39 Global Step: 222680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:23,708-Speed 11218.48 samples/sec Loss 3.8814 LearningRate 0.0000 Epoch: 39 Global Step: 222690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:10:24,641-Speed 10988.04 samples/sec Loss 4.0532 LearningRate 0.0000 Epoch: 39 Global Step: 222700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:25,630-Speed 10355.33 samples/sec Loss 3.9606 LearningRate 0.0000 Epoch: 39 Global Step: 222710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:26,569-Speed 10915.70 samples/sec Loss 3.9827 LearningRate 0.0000 Epoch: 39 Global Step: 222720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:27,506-Speed 10944.38 samples/sec Loss 3.8132 LearningRate 0.0000 Epoch: 39 Global Step: 222730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:28,467-Speed 10654.20 samples/sec Loss 3.9404 LearningRate 0.0000 Epoch: 39 Global Step: 222740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:29,427-Speed 10673.79 samples/sec Loss 3.9375 LearningRate 0.0000 Epoch: 39 Global Step: 222750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:30,377-Speed 10791.56 samples/sec Loss 4.0799 LearningRate 0.0000 Epoch: 39 Global Step: 222760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:31,305-Speed 11039.07 samples/sec Loss 3.9971 LearningRate 0.0000 Epoch: 39 Global Step: 222770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:32,281-Speed 10504.75 samples/sec Loss 3.9548 LearningRate 0.0000 Epoch: 39 Global Step: 222780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:33,182-Speed 11368.48 samples/sec Loss 4.0583 LearningRate 0.0000 Epoch: 39 Global Step: 222790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:34,110-Speed 11036.49 samples/sec Loss 3.9808 LearningRate 0.0000 Epoch: 39 Global Step: 222800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:10:35,029-Speed 11152.17 samples/sec Loss 3.9833 LearningRate 0.0000 Epoch: 39 Global Step: 222810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:10:35,929-Speed 11381.58 samples/sec Loss 4.0614 LearningRate 0.0000 Epoch: 39 Global Step: 222820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:36,929-Speed 10251.16 samples/sec Loss 3.9510 LearningRate 0.0000 Epoch: 39 Global Step: 222830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:37,858-Speed 11032.62 samples/sec Loss 3.9806 LearningRate 0.0000 Epoch: 39 Global Step: 222840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:38,869-Speed 10128.77 samples/sec Loss 3.9936 LearningRate 0.0000 Epoch: 39 Global Step: 222850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:39,833-Speed 10631.09 samples/sec Loss 4.0657 LearningRate 0.0000 Epoch: 39 Global Step: 222860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:40,799-Speed 10610.52 samples/sec Loss 3.8956 LearningRate 0.0000 Epoch: 39 Global Step: 222870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:41,731-Speed 11008.16 samples/sec Loss 4.0646 LearningRate 0.0000 Epoch: 39 Global Step: 222880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:42,686-Speed 10725.27 samples/sec Loss 4.0033 LearningRate 0.0000 Epoch: 39 Global Step: 222890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:43,631-Speed 10845.53 samples/sec Loss 3.9170 LearningRate 0.0000 Epoch: 39 Global Step: 222900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:44,586-Speed 10728.86 samples/sec Loss 4.0654 LearningRate 0.0000 Epoch: 39 Global Step: 222910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:45,526-Speed 10905.59 samples/sec Loss 3.9723 LearningRate 0.0000 Epoch: 39 Global Step: 222920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:46,470-Speed 10845.29 samples/sec Loss 3.9001 LearningRate 0.0000 Epoch: 39 Global Step: 222930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:47,420-Speed 10797.44 samples/sec Loss 4.0786 LearningRate 0.0000 Epoch: 39 Global Step: 222940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:48,414-Speed 10307.33 samples/sec Loss 3.8927 LearningRate 0.0000 Epoch: 39 Global Step: 222950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:49,409-Speed 10300.13 samples/sec Loss 4.0882 LearningRate 0.0000 Epoch: 39 Global Step: 222960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:50,338-Speed 11020.58 samples/sec Loss 3.9072 LearningRate 0.0000 Epoch: 39 Global Step: 222970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:51,279-Speed 10893.49 samples/sec Loss 4.0113 LearningRate 0.0000 Epoch: 39 Global Step: 222980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:52,248-Speed 10572.57 samples/sec Loss 3.8710 LearningRate 0.0000 Epoch: 39 Global Step: 222990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:10:53,195-Speed 10816.72 samples/sec Loss 3.8837 LearningRate 0.0000 Epoch: 39 Global Step: 223000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:54,167-Speed 10542.13 samples/sec Loss 3.9953 LearningRate 0.0000 Epoch: 39 Global Step: 223010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:55,114-Speed 10823.20 samples/sec Loss 3.8527 LearningRate 0.0000 Epoch: 39 Global Step: 223020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:56,043-Speed 11036.59 samples/sec Loss 4.0876 LearningRate 0.0000 Epoch: 39 Global Step: 223030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:56,951-Speed 11285.91 samples/sec Loss 3.9111 LearningRate 0.0000 Epoch: 39 Global Step: 223040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:57,942-Speed 10341.38 samples/sec Loss 3.9282 LearningRate 0.0000 Epoch: 39 Global Step: 223050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:58,886-Speed 10854.87 samples/sec Loss 3.9869 LearningRate 0.0000 Epoch: 39 Global Step: 223060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:10:59,855-Speed 10576.35 samples/sec Loss 4.0252 LearningRate 0.0000 Epoch: 39 Global Step: 223070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:00,822-Speed 10589.67 samples/sec Loss 4.0369 LearningRate 0.0000 Epoch: 39 Global Step: 223080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:01,802-Speed 10459.35 samples/sec Loss 3.9439 LearningRate 0.0000 Epoch: 39 Global Step: 223090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:02,762-Speed 10673.18 samples/sec Loss 3.9748 LearningRate 0.0000 Epoch: 39 Global Step: 223100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:03,733-Speed 10560.42 samples/sec Loss 4.0199 LearningRate 0.0000 Epoch: 39 Global Step: 223110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:04,679-Speed 10830.07 samples/sec Loss 4.0392 LearningRate 0.0000 Epoch: 39 Global Step: 223120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:05,630-Speed 10770.94 samples/sec Loss 4.0289 LearningRate 0.0000 Epoch: 39 Global Step: 223130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:06,596-Speed 10604.13 samples/sec Loss 3.9779 LearningRate 0.0000 Epoch: 39 Global Step: 223140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:07,603-Speed 10180.51 samples/sec Loss 4.0000 LearningRate 0.0000 Epoch: 39 Global Step: 223150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:08,570-Speed 10596.06 samples/sec Loss 3.9529 LearningRate 0.0000 Epoch: 39 Global Step: 223160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:09,532-Speed 10652.70 samples/sec Loss 3.9964 LearningRate 0.0000 Epoch: 39 Global Step: 223170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:10,464-Speed 10987.70 samples/sec Loss 3.9569 LearningRate 0.0000 Epoch: 39 Global Step: 223180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:11,430-Speed 10607.90 samples/sec Loss 4.0317 LearningRate 0.0000 Epoch: 39 Global Step: 223190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:12,374-Speed 10857.39 samples/sec Loss 3.9866 LearningRate 0.0000 Epoch: 39 Global Step: 223200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:13,339-Speed 10620.50 samples/sec Loss 3.9940 LearningRate 0.0000 Epoch: 39 Global Step: 223210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:14,323-Speed 10410.31 samples/sec Loss 4.0278 LearningRate 0.0000 Epoch: 39 Global Step: 223220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:15,267-Speed 10855.05 samples/sec Loss 3.8674 LearningRate 0.0000 Epoch: 39 Global Step: 223230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:16,239-Speed 10541.48 samples/sec Loss 4.0078 LearningRate 0.0000 Epoch: 39 Global Step: 223240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:17,182-Speed 10873.81 samples/sec Loss 3.9203 LearningRate 0.0000 Epoch: 39 Global Step: 223250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:18,107-Speed 11076.33 samples/sec Loss 4.0034 LearningRate 0.0000 Epoch: 39 Global Step: 223260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:19,060-Speed 10753.82 samples/sec Loss 3.9752 LearningRate 0.0000 Epoch: 39 Global Step: 223270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:20,016-Speed 10721.96 samples/sec Loss 4.0771 LearningRate 0.0000 Epoch: 39 Global Step: 223280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:20,938-Speed 11114.02 samples/sec Loss 3.8499 LearningRate 0.0000 Epoch: 39 Global Step: 223290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:21,877-Speed 10913.84 samples/sec Loss 4.0228 LearningRate 0.0000 Epoch: 39 Global Step: 223300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:22,782-Speed 11319.56 samples/sec Loss 3.9010 LearningRate 0.0000 Epoch: 39 Global Step: 223310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:23,733-Speed 10775.43 samples/sec Loss 3.9690 LearningRate 0.0000 Epoch: 39 Global Step: 223320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:24,706-Speed 10528.70 samples/sec Loss 3.9344 LearningRate 0.0000 Epoch: 39 Global Step: 223330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:25,651-Speed 10849.05 samples/sec Loss 4.0136 LearningRate 0.0000 Epoch: 39 Global Step: 223340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:26,600-Speed 10801.45 samples/sec Loss 3.9018 LearningRate 0.0000 Epoch: 39 Global Step: 223350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:27,592-Speed 10322.21 samples/sec Loss 4.0611 LearningRate 0.0000 Epoch: 39 Global Step: 223360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:28,564-Speed 10549.02 samples/sec Loss 4.0902 LearningRate 0.0000 Epoch: 39 Global Step: 223370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:29,509-Speed 10836.78 samples/sec Loss 3.8985 LearningRate 0.0000 Epoch: 39 Global Step: 223380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:30,448-Speed 10913.58 samples/sec Loss 4.0154 LearningRate 0.0000 Epoch: 39 Global Step: 223390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:11:31,437-Speed 10365.34 samples/sec Loss 3.9769 LearningRate 0.0000 Epoch: 39 Global Step: 223400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:32,401-Speed 10627.19 samples/sec Loss 3.9297 LearningRate 0.0000 Epoch: 39 Global Step: 223410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:33,321-Speed 11139.64 samples/sec Loss 3.8866 LearningRate 0.0000 Epoch: 39 Global Step: 223420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:34,257-Speed 10945.01 samples/sec Loss 3.9797 LearningRate 0.0000 Epoch: 39 Global Step: 223430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:35,211-Speed 10742.95 samples/sec Loss 3.9010 LearningRate 0.0000 Epoch: 39 Global Step: 223440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:36,163-Speed 10759.92 samples/sec Loss 3.9629 LearningRate 0.0000 Epoch: 39 Global Step: 223450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:37,138-Speed 10513.74 samples/sec Loss 3.9817 LearningRate 0.0000 Epoch: 39 Global Step: 223460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:38,102-Speed 10631.52 samples/sec Loss 4.0701 LearningRate 0.0000 Epoch: 39 Global Step: 223470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:39,067-Speed 10616.78 samples/sec Loss 4.0599 LearningRate 0.0000 Epoch: 39 Global Step: 223480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:40,026-Speed 10679.48 samples/sec Loss 3.9851 LearningRate 0.0000 Epoch: 39 Global Step: 223490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:40,962-Speed 10950.26 samples/sec Loss 4.0248 LearningRate 0.0000 Epoch: 39 Global Step: 223500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:41,930-Speed 10600.16 samples/sec Loss 3.9858 LearningRate 0.0000 Epoch: 39 Global Step: 223510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:42,919-Speed 10360.35 samples/sec Loss 4.0826 LearningRate 0.0000 Epoch: 39 Global Step: 223520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:43,899-Speed 10452.16 samples/sec Loss 4.0109 LearningRate 0.0000 Epoch: 39 Global Step: 223530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:44,830-Speed 11014.66 samples/sec Loss 4.0390 LearningRate 0.0000 Epoch: 39 Global Step: 223540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:45,772-Speed 10870.82 samples/sec Loss 3.9342 LearningRate 0.0000 Epoch: 39 Global Step: 223550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:46,685-Speed 11226.59 samples/sec Loss 4.0924 LearningRate 0.0000 Epoch: 39 Global Step: 223560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:47,591-Speed 11307.81 samples/sec Loss 3.9754 LearningRate 0.0000 Epoch: 39 Global Step: 223570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:48,538-Speed 10819.42 samples/sec Loss 4.0292 LearningRate 0.0000 Epoch: 39 Global Step: 223580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:49,492-Speed 10740.72 samples/sec Loss 4.0654 LearningRate 0.0000 Epoch: 39 Global Step: 223590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:50,463-Speed 10556.31 samples/sec Loss 4.1147 LearningRate 0.0000 Epoch: 39 Global Step: 223600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:51,384-Speed 11127.07 samples/sec Loss 3.9461 LearningRate 0.0000 Epoch: 39 Global Step: 223610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:52,346-Speed 10652.55 samples/sec Loss 3.9531 LearningRate 0.0000 Epoch: 39 Global Step: 223620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:53,317-Speed 10551.60 samples/sec Loss 4.0933 LearningRate 0.0000 Epoch: 39 Global Step: 223630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:54,279-Speed 10655.61 samples/sec Loss 4.0720 LearningRate 0.0000 Epoch: 39 Global Step: 223640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:55,242-Speed 10632.87 samples/sec Loss 4.0251 LearningRate 0.0000 Epoch: 39 Global Step: 223650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:56,176-Speed 10981.26 samples/sec Loss 3.8717 LearningRate 0.0000 Epoch: 39 Global Step: 223660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:57,093-Speed 11166.81 samples/sec Loss 3.9332 LearningRate 0.0000 Epoch: 39 Global Step: 223670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:11:58,033-Speed 10901.71 samples/sec Loss 3.9449 LearningRate 0.0000 Epoch: 39 Global Step: 223680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:58,951-Speed 11160.28 samples/sec Loss 3.9948 LearningRate 0.0000 Epoch: 39 Global Step: 223690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:11:59,889-Speed 10932.36 samples/sec Loss 4.0108 LearningRate 0.0000 Epoch: 39 Global Step: 223700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:00,842-Speed 10749.12 samples/sec Loss 3.8799 LearningRate 0.0000 Epoch: 39 Global Step: 223710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:01,790-Speed 10811.54 samples/sec Loss 4.0057 LearningRate 0.0000 Epoch: 39 Global Step: 223720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:02,734-Speed 10848.87 samples/sec Loss 3.9018 LearningRate 0.0000 Epoch: 39 Global Step: 223730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:03,697-Speed 10647.99 samples/sec Loss 3.9706 LearningRate 0.0000 Epoch: 39 Global Step: 223740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:04,668-Speed 10553.40 samples/sec Loss 4.1030 LearningRate 0.0000 Epoch: 39 Global Step: 223750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:05,651-Speed 10419.58 samples/sec Loss 3.9458 LearningRate 0.0000 Epoch: 39 Global Step: 223760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:06,661-Speed 10142.81 samples/sec Loss 3.9047 LearningRate 0.0000 Epoch: 39 Global Step: 223770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:07,673-Speed 10135.35 samples/sec Loss 4.0241 LearningRate 0.0000 Epoch: 39 Global Step: 223780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:08,670-Speed 10274.76 samples/sec Loss 3.8605 LearningRate 0.0000 Epoch: 39 Global Step: 223790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:09,601-Speed 11003.36 samples/sec Loss 4.0923 LearningRate 0.0000 Epoch: 39 Global Step: 223800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:10,534-Speed 10979.78 samples/sec Loss 4.0118 LearningRate 0.0000 Epoch: 39 Global Step: 223810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:11,436-Speed 11360.46 samples/sec Loss 4.0628 LearningRate 0.0000 Epoch: 39 Global Step: 223820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:12,452-Speed 10087.54 samples/sec Loss 4.0518 LearningRate 0.0000 Epoch: 39 Global Step: 223830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:13,407-Speed 10733.99 samples/sec Loss 3.9701 LearningRate 0.0000 Epoch: 39 Global Step: 223840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:14,439-Speed 9930.91 samples/sec Loss 3.9542 LearningRate 0.0000 Epoch: 39 Global Step: 223850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:15,439-Speed 10244.78 samples/sec Loss 3.9593 LearningRate 0.0000 Epoch: 39 Global Step: 223860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:16,420-Speed 10446.79 samples/sec Loss 3.8882 LearningRate 0.0000 Epoch: 39 Global Step: 223870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:17,361-Speed 10891.57 samples/sec Loss 3.9674 LearningRate 0.0000 Epoch: 39 Global Step: 223880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:18,307-Speed 10828.68 samples/sec Loss 4.0010 LearningRate 0.0000 Epoch: 39 Global Step: 223890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:19,268-Speed 10655.87 samples/sec Loss 3.9551 LearningRate 0.0000 Epoch: 39 Global Step: 223900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:20,219-Speed 10774.77 samples/sec Loss 3.9292 LearningRate 0.0000 Epoch: 39 Global Step: 223910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:21,158-Speed 10919.03 samples/sec Loss 3.8849 LearningRate 0.0000 Epoch: 39 Global Step: 223920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:22,091-Speed 10979.23 samples/sec Loss 3.9700 LearningRate 0.0000 Epoch: 39 Global Step: 223930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:23,024-Speed 10980.15 samples/sec Loss 4.0847 LearningRate 0.0000 Epoch: 39 Global Step: 223940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:12:23,964-Speed 10906.68 samples/sec Loss 4.0070 LearningRate 0.0000 Epoch: 39 Global Step: 223950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:24,904-Speed 10896.99 samples/sec Loss 4.0123 LearningRate 0.0000 Epoch: 39 Global Step: 223960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:25,843-Speed 10913.97 samples/sec Loss 3.9145 LearningRate 0.0000 Epoch: 39 Global Step: 223970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:26,827-Speed 10424.08 samples/sec Loss 4.0441 LearningRate 0.0000 Epoch: 39 Global Step: 223980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:27,832-Speed 10193.61 samples/sec Loss 3.8664 LearningRate 0.0000 Epoch: 39 Global Step: 223990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:28,773-Speed 10891.86 samples/sec Loss 4.1704 LearningRate 0.0000 Epoch: 39 Global Step: 224000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:12:50,225-[lfw][224000]XNorm: 7.593042 Training: 2022-04-11 07:12:50,226-[lfw][224000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-11 07:12:50,226-[lfw][224000]Accuracy-Highest: 0.99750 Training: 2022-04-11 07:13:14,809-[cfp_fp][224000]XNorm: 6.508784 Training: 2022-04-11 07:13:14,810-[cfp_fp][224000]Accuracy-Flip: 0.95529+-0.01081 Training: 2022-04-11 07:13:14,810-[cfp_fp][224000]Accuracy-Highest: 0.95943 Training: 2022-04-11 07:13:36,042-[agedb_30][224000]XNorm: 7.450662 Training: 2022-04-11 07:13:36,043-[agedb_30][224000]Accuracy-Flip: 0.97100+-0.00616 Training: 2022-04-11 07:13:36,043-[agedb_30][224000]Accuracy-Highest: 0.97183 Training: 2022-04-11 07:13:37,000-Speed 150.09 samples/sec Loss 3.9865 LearningRate 0.0000 Epoch: 39 Global Step: 224010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:13:37,998-Speed 10264.91 samples/sec Loss 3.9733 LearningRate 0.0000 Epoch: 39 Global Step: 224020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:13:38,967-Speed 10571.11 samples/sec Loss 4.0683 LearningRate 0.0000 Epoch: 39 Global Step: 224030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:13:39,935-Speed 10588.03 samples/sec Loss 3.8449 LearningRate 0.0000 Epoch: 39 Global Step: 224040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:13:40,853-Speed 11164.63 samples/sec Loss 3.9570 LearningRate 0.0000 Epoch: 39 Global Step: 224050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:41,806-Speed 10750.85 samples/sec Loss 4.1099 LearningRate 0.0000 Epoch: 39 Global Step: 224060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:42,772-Speed 10605.36 samples/sec Loss 4.0730 LearningRate 0.0000 Epoch: 39 Global Step: 224070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:43,720-Speed 10812.44 samples/sec Loss 3.9422 LearningRate 0.0000 Epoch: 39 Global Step: 224080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:44,636-Speed 11186.00 samples/sec Loss 4.1693 LearningRate 0.0000 Epoch: 39 Global Step: 224090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:45,553-Speed 11175.21 samples/sec Loss 3.9238 LearningRate 0.0000 Epoch: 39 Global Step: 224100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:46,494-Speed 10883.37 samples/sec Loss 3.9540 LearningRate 0.0000 Epoch: 39 Global Step: 224110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:47,429-Speed 10966.19 samples/sec Loss 3.8951 LearningRate 0.0000 Epoch: 39 Global Step: 224120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:48,425-Speed 10282.78 samples/sec Loss 3.9677 LearningRate 0.0000 Epoch: 39 Global Step: 224130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:49,356-Speed 11009.36 samples/sec Loss 4.1449 LearningRate 0.0000 Epoch: 39 Global Step: 224140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:50,304-Speed 10809.89 samples/sec Loss 4.0206 LearningRate 0.0000 Epoch: 39 Global Step: 224150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:51,303-Speed 10255.58 samples/sec Loss 4.0392 LearningRate 0.0000 Epoch: 39 Global Step: 224160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:52,249-Speed 10837.65 samples/sec Loss 3.9242 LearningRate 0.0000 Epoch: 39 Global Step: 224170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:53,177-Speed 11039.18 samples/sec Loss 4.0123 LearningRate 0.0000 Epoch: 39 Global Step: 224180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:54,220-Speed 9821.22 samples/sec Loss 3.9995 LearningRate 0.0000 Epoch: 39 Global Step: 224190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:55,236-Speed 10089.01 samples/sec Loss 3.9488 LearningRate 0.0000 Epoch: 39 Global Step: 224200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:56,240-Speed 10201.05 samples/sec Loss 4.0297 LearningRate 0.0000 Epoch: 39 Global Step: 224210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:57,181-Speed 10889.43 samples/sec Loss 4.0045 LearningRate 0.0000 Epoch: 39 Global Step: 224220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:13:58,087-Speed 11316.81 samples/sec Loss 3.9612 LearningRate 0.0000 Epoch: 39 Global Step: 224230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:13:59,034-Speed 10823.07 samples/sec Loss 3.9642 LearningRate 0.0000 Epoch: 39 Global Step: 224240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:13:59,954-Speed 11133.58 samples/sec Loss 3.9099 LearningRate 0.0000 Epoch: 39 Global Step: 224250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:00,897-Speed 10870.05 samples/sec Loss 4.0220 LearningRate 0.0000 Epoch: 39 Global Step: 224260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:01,876-Speed 10471.24 samples/sec Loss 3.9560 LearningRate 0.0000 Epoch: 39 Global Step: 224270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:02,863-Speed 10381.29 samples/sec Loss 4.0065 LearningRate 0.0000 Epoch: 39 Global Step: 224280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:03,846-Speed 10417.18 samples/sec Loss 4.0684 LearningRate 0.0000 Epoch: 39 Global Step: 224290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:04,805-Speed 10692.78 samples/sec Loss 3.9295 LearningRate 0.0000 Epoch: 39 Global Step: 224300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:05,731-Speed 11066.98 samples/sec Loss 4.0865 LearningRate 0.0000 Epoch: 39 Global Step: 224310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:06,673-Speed 10874.98 samples/sec Loss 3.9818 LearningRate 0.0000 Epoch: 39 Global Step: 224320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:07,616-Speed 10868.50 samples/sec Loss 3.8643 LearningRate 0.0000 Epoch: 39 Global Step: 224330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:14:08,546-Speed 11014.47 samples/sec Loss 4.0328 LearningRate 0.0000 Epoch: 39 Global Step: 224340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:14:09,466-Speed 11142.73 samples/sec Loss 4.0428 LearningRate 0.0000 Epoch: 39 Global Step: 224350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:10,382-Speed 11182.59 samples/sec Loss 3.9131 LearningRate 0.0000 Epoch: 39 Global Step: 224360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:11,323-Speed 10887.32 samples/sec Loss 4.0525 LearningRate 0.0000 Epoch: 39 Global Step: 224370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:12,262-Speed 10910.32 samples/sec Loss 4.0002 LearningRate 0.0000 Epoch: 39 Global Step: 224380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:13,207-Speed 10843.46 samples/sec Loss 3.9943 LearningRate 0.0000 Epoch: 39 Global Step: 224390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:14,208-Speed 10235.12 samples/sec Loss 3.9231 LearningRate 0.0000 Epoch: 39 Global Step: 224400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:15,208-Speed 10251.70 samples/sec Loss 3.8944 LearningRate 0.0000 Epoch: 39 Global Step: 224410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:16,182-Speed 10523.16 samples/sec Loss 3.9467 LearningRate 0.0000 Epoch: 39 Global Step: 224420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:17,159-Speed 10479.99 samples/sec Loss 3.8810 LearningRate 0.0000 Epoch: 39 Global Step: 224430 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:18,125-Speed 10607.49 samples/sec Loss 3.8675 LearningRate 0.0000 Epoch: 39 Global Step: 224440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:19,142-Speed 10081.81 samples/sec Loss 3.9800 LearningRate 0.0000 Epoch: 39 Global Step: 224450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:20,121-Speed 10461.24 samples/sec Loss 3.9912 LearningRate 0.0000 Epoch: 39 Global Step: 224460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:21,043-Speed 11117.93 samples/sec Loss 3.9267 LearningRate 0.0000 Epoch: 39 Global Step: 224470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:14:21,963-Speed 11134.77 samples/sec Loss 3.9576 LearningRate 0.0000 Epoch: 39 Global Step: 224480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:22,887-Speed 11096.49 samples/sec Loss 4.0454 LearningRate 0.0000 Epoch: 39 Global Step: 224490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:23,851-Speed 10626.50 samples/sec Loss 3.9572 LearningRate 0.0000 Epoch: 39 Global Step: 224500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:24,811-Speed 10679.54 samples/sec Loss 3.9641 LearningRate 0.0000 Epoch: 39 Global Step: 224510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:25,750-Speed 10912.16 samples/sec Loss 3.9650 LearningRate 0.0000 Epoch: 39 Global Step: 224520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:26,708-Speed 10698.39 samples/sec Loss 3.9369 LearningRate 0.0000 Epoch: 39 Global Step: 224530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:27,641-Speed 10983.87 samples/sec Loss 3.9934 LearningRate 0.0000 Epoch: 39 Global Step: 224540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:28,585-Speed 10854.51 samples/sec Loss 3.9261 LearningRate 0.0000 Epoch: 39 Global Step: 224550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:29,584-Speed 10252.15 samples/sec Loss 4.0214 LearningRate 0.0000 Epoch: 39 Global Step: 224560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:30,549-Speed 10623.91 samples/sec Loss 3.9935 LearningRate 0.0000 Epoch: 39 Global Step: 224570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:31,495-Speed 10831.97 samples/sec Loss 4.0396 LearningRate 0.0000 Epoch: 39 Global Step: 224580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:14:32,436-Speed 10894.16 samples/sec Loss 4.0858 LearningRate 0.0000 Epoch: 39 Global Step: 224590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:33,351-Speed 11194.88 samples/sec Loss 4.0281 LearningRate 0.0000 Epoch: 39 Global Step: 224600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:34,289-Speed 10923.41 samples/sec Loss 3.9824 LearningRate 0.0000 Epoch: 39 Global Step: 224610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:35,284-Speed 10294.16 samples/sec Loss 4.0430 LearningRate 0.0000 Epoch: 39 Global Step: 224620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:36,269-Speed 10406.17 samples/sec Loss 3.9786 LearningRate 0.0000 Epoch: 39 Global Step: 224630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:37,199-Speed 11022.88 samples/sec Loss 3.9482 LearningRate 0.0000 Epoch: 39 Global Step: 224640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:38,138-Speed 10909.70 samples/sec Loss 3.9416 LearningRate 0.0000 Epoch: 39 Global Step: 224650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:39,125-Speed 10382.28 samples/sec Loss 4.0351 LearningRate 0.0000 Epoch: 39 Global Step: 224660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:40,081-Speed 10719.29 samples/sec Loss 3.9052 LearningRate 0.0000 Epoch: 39 Global Step: 224670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:41,044-Speed 10639.06 samples/sec Loss 4.0552 LearningRate 0.0000 Epoch: 39 Global Step: 224680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:42,081-Speed 9885.95 samples/sec Loss 3.9060 LearningRate 0.0000 Epoch: 39 Global Step: 224690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:14:43,039-Speed 10693.04 samples/sec Loss 3.9899 LearningRate 0.0000 Epoch: 39 Global Step: 224700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:43,996-Speed 10716.18 samples/sec Loss 3.9941 LearningRate 0.0000 Epoch: 39 Global Step: 224710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:44,937-Speed 10881.97 samples/sec Loss 4.0404 LearningRate 0.0000 Epoch: 39 Global Step: 224720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:45,845-Speed 11292.52 samples/sec Loss 3.9990 LearningRate 0.0000 Epoch: 39 Global Step: 224730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:46,788-Speed 10868.09 samples/sec Loss 3.9522 LearningRate 0.0000 Epoch: 39 Global Step: 224740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:47,730-Speed 10879.68 samples/sec Loss 4.0604 LearningRate 0.0000 Epoch: 39 Global Step: 224750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:48,668-Speed 10916.27 samples/sec Loss 3.9908 LearningRate 0.0000 Epoch: 39 Global Step: 224760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:49,578-Speed 11264.19 samples/sec Loss 3.9367 LearningRate 0.0000 Epoch: 39 Global Step: 224770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:50,527-Speed 10803.28 samples/sec Loss 3.9437 LearningRate 0.0000 Epoch: 39 Global Step: 224780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:51,485-Speed 10689.70 samples/sec Loss 3.9568 LearningRate 0.0000 Epoch: 39 Global Step: 224790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:52,463-Speed 10484.34 samples/sec Loss 4.1091 LearningRate 0.0000 Epoch: 39 Global Step: 224800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:14:53,413-Speed 10782.97 samples/sec Loss 3.9962 LearningRate 0.0000 Epoch: 39 Global Step: 224810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:54,381-Speed 10582.14 samples/sec Loss 3.9487 LearningRate 0.0000 Epoch: 39 Global Step: 224820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:55,324-Speed 10870.36 samples/sec Loss 3.9542 LearningRate 0.0000 Epoch: 39 Global Step: 224830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:56,276-Speed 10764.43 samples/sec Loss 4.0157 LearningRate 0.0000 Epoch: 39 Global Step: 224840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:57,229-Speed 10753.66 samples/sec Loss 3.8937 LearningRate 0.0000 Epoch: 39 Global Step: 224850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:58,139-Speed 11261.05 samples/sec Loss 3.9168 LearningRate 0.0000 Epoch: 39 Global Step: 224860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:14:59,106-Speed 10597.22 samples/sec Loss 3.9956 LearningRate 0.0000 Epoch: 39 Global Step: 224870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:00,117-Speed 10135.87 samples/sec Loss 4.0237 LearningRate 0.0000 Epoch: 39 Global Step: 224880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:01,083-Speed 10603.97 samples/sec Loss 3.9077 LearningRate 0.0000 Epoch: 39 Global Step: 224890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:02,047-Speed 10632.03 samples/sec Loss 3.9386 LearningRate 0.0000 Epoch: 39 Global Step: 224900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:03,046-Speed 10259.02 samples/sec Loss 3.9893 LearningRate 0.0000 Epoch: 39 Global Step: 224910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:04,035-Speed 10360.96 samples/sec Loss 3.8357 LearningRate 0.0000 Epoch: 39 Global Step: 224920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:04,998-Speed 10637.18 samples/sec Loss 3.9094 LearningRate 0.0000 Epoch: 39 Global Step: 224930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:05,930-Speed 10999.80 samples/sec Loss 3.8671 LearningRate 0.0000 Epoch: 39 Global Step: 224940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:06,909-Speed 10465.03 samples/sec Loss 3.9940 LearningRate 0.0000 Epoch: 39 Global Step: 224950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:07,886-Speed 10492.04 samples/sec Loss 4.0678 LearningRate 0.0000 Epoch: 39 Global Step: 224960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:08,810-Speed 11091.72 samples/sec Loss 4.0214 LearningRate 0.0000 Epoch: 39 Global Step: 224970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:09,736-Speed 11060.03 samples/sec Loss 3.9990 LearningRate 0.0000 Epoch: 39 Global Step: 224980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:10,681-Speed 10844.69 samples/sec Loss 3.9522 LearningRate 0.0000 Epoch: 39 Global Step: 224990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:11,629-Speed 10818.88 samples/sec Loss 4.0964 LearningRate 0.0000 Epoch: 39 Global Step: 225000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:12,579-Speed 10778.22 samples/sec Loss 3.9107 LearningRate 0.0000 Epoch: 39 Global Step: 225010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:13,502-Speed 11105.29 samples/sec Loss 3.9187 LearningRate 0.0000 Epoch: 39 Global Step: 225020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:14,475-Speed 10531.44 samples/sec Loss 4.0215 LearningRate 0.0000 Epoch: 39 Global Step: 225030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:15,436-Speed 10667.54 samples/sec Loss 4.1154 LearningRate 0.0000 Epoch: 39 Global Step: 225040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:16,389-Speed 10746.82 samples/sec Loss 3.9743 LearningRate 0.0000 Epoch: 39 Global Step: 225050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:17,328-Speed 10913.85 samples/sec Loss 4.0247 LearningRate 0.0000 Epoch: 39 Global Step: 225060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:18,321-Speed 10319.77 samples/sec Loss 3.9009 LearningRate 0.0000 Epoch: 39 Global Step: 225070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:19,311-Speed 10349.64 samples/sec Loss 4.0384 LearningRate 0.0000 Epoch: 39 Global Step: 225080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:20,243-Speed 10999.74 samples/sec Loss 3.9936 LearningRate 0.0000 Epoch: 39 Global Step: 225090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:21,160-Speed 11165.80 samples/sec Loss 4.0507 LearningRate 0.0000 Epoch: 39 Global Step: 225100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:22,083-Speed 11105.68 samples/sec Loss 3.9492 LearningRate 0.0000 Epoch: 39 Global Step: 225110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:23,006-Speed 11096.94 samples/sec Loss 4.0916 LearningRate 0.0000 Epoch: 39 Global Step: 225120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:23,946-Speed 10906.46 samples/sec Loss 3.9615 LearningRate 0.0000 Epoch: 39 Global Step: 225130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:24,916-Speed 10563.28 samples/sec Loss 3.9208 LearningRate 0.0000 Epoch: 39 Global Step: 225140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:25,885-Speed 10576.04 samples/sec Loss 3.9515 LearningRate 0.0000 Epoch: 39 Global Step: 225150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:26,834-Speed 10799.18 samples/sec Loss 3.9873 LearningRate 0.0000 Epoch: 39 Global Step: 225160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:27,792-Speed 10694.83 samples/sec Loss 4.0719 LearningRate 0.0000 Epoch: 39 Global Step: 225170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:28,752-Speed 10666.48 samples/sec Loss 3.8837 LearningRate 0.0000 Epoch: 39 Global Step: 225180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:29,704-Speed 10764.78 samples/sec Loss 3.9337 LearningRate 0.0000 Epoch: 39 Global Step: 225190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:30,650-Speed 10835.73 samples/sec Loss 3.9346 LearningRate 0.0000 Epoch: 39 Global Step: 225200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:31,597-Speed 10817.42 samples/sec Loss 4.1234 LearningRate 0.0000 Epoch: 39 Global Step: 225210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:32,558-Speed 10672.38 samples/sec Loss 3.8254 LearningRate 0.0000 Epoch: 39 Global Step: 225220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:33,491-Speed 10978.40 samples/sec Loss 4.0821 LearningRate 0.0000 Epoch: 39 Global Step: 225230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:34,423-Speed 10990.23 samples/sec Loss 3.9559 LearningRate 0.0000 Epoch: 39 Global Step: 225240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:35,352-Speed 11035.38 samples/sec Loss 3.9857 LearningRate 0.0000 Epoch: 39 Global Step: 225250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:36,302-Speed 10789.18 samples/sec Loss 4.0390 LearningRate 0.0000 Epoch: 39 Global Step: 225260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:15:37,243-Speed 10887.29 samples/sec Loss 3.9925 LearningRate 0.0000 Epoch: 39 Global Step: 225270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:38,188-Speed 10846.51 samples/sec Loss 4.0529 LearningRate 0.0000 Epoch: 39 Global Step: 225280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:39,165-Speed 10483.70 samples/sec Loss 3.9121 LearningRate 0.0000 Epoch: 39 Global Step: 225290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:40,136-Speed 10550.39 samples/sec Loss 3.9856 LearningRate 0.0000 Epoch: 39 Global Step: 225300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:41,053-Speed 11179.38 samples/sec Loss 3.9396 LearningRate 0.0000 Epoch: 39 Global Step: 225310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:41,992-Speed 10911.09 samples/sec Loss 4.0150 LearningRate 0.0000 Epoch: 39 Global Step: 225320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:42,938-Speed 10839.67 samples/sec Loss 3.9043 LearningRate 0.0000 Epoch: 39 Global Step: 225330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:43,863-Speed 11073.06 samples/sec Loss 3.8886 LearningRate 0.0000 Epoch: 39 Global Step: 225340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:44,801-Speed 10920.38 samples/sec Loss 3.9496 LearningRate 0.0000 Epoch: 39 Global Step: 225350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:45,728-Speed 11055.36 samples/sec Loss 3.9439 LearningRate 0.0000 Epoch: 39 Global Step: 225360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:46,649-Speed 11129.72 samples/sec Loss 3.8594 LearningRate 0.0000 Epoch: 39 Global Step: 225370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:47,595-Speed 10833.90 samples/sec Loss 3.9186 LearningRate 0.0000 Epoch: 39 Global Step: 225380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:48,544-Speed 10792.44 samples/sec Loss 3.9627 LearningRate 0.0000 Epoch: 39 Global Step: 225390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:49,472-Speed 11049.92 samples/sec Loss 4.0805 LearningRate 0.0000 Epoch: 39 Global Step: 225400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:50,431-Speed 10683.83 samples/sec Loss 4.0931 LearningRate 0.0000 Epoch: 39 Global Step: 225410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:51,420-Speed 10361.79 samples/sec Loss 3.8848 LearningRate 0.0000 Epoch: 39 Global Step: 225420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:52,399-Speed 10462.75 samples/sec Loss 4.0067 LearningRate 0.0000 Epoch: 39 Global Step: 225430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:53,364-Speed 10623.34 samples/sec Loss 4.0572 LearningRate 0.0000 Epoch: 39 Global Step: 225440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:54,319-Speed 10732.17 samples/sec Loss 3.9187 LearningRate 0.0000 Epoch: 39 Global Step: 225450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:15:55,265-Speed 10831.81 samples/sec Loss 3.9965 LearningRate 0.0000 Epoch: 39 Global Step: 225460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:56,195-Speed 11014.06 samples/sec Loss 4.0014 LearningRate 0.0000 Epoch: 39 Global Step: 225470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:57,106-Speed 11246.42 samples/sec Loss 3.9625 LearningRate 0.0000 Epoch: 39 Global Step: 225480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:58,025-Speed 11149.52 samples/sec Loss 3.9561 LearningRate 0.0000 Epoch: 39 Global Step: 225490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:58,988-Speed 10638.78 samples/sec Loss 3.9520 LearningRate 0.0000 Epoch: 39 Global Step: 225500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:15:59,980-Speed 10335.84 samples/sec Loss 3.9762 LearningRate 0.0000 Epoch: 39 Global Step: 225510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:00,917-Speed 10936.66 samples/sec Loss 4.0485 LearningRate 0.0000 Epoch: 39 Global Step: 225520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:01,869-Speed 10762.68 samples/sec Loss 3.9368 LearningRate 0.0000 Epoch: 39 Global Step: 225530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:02,808-Speed 10912.58 samples/sec Loss 3.9680 LearningRate 0.0000 Epoch: 39 Global Step: 225540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:03,788-Speed 10454.79 samples/sec Loss 3.9749 LearningRate 0.0000 Epoch: 39 Global Step: 225550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:04,790-Speed 10219.57 samples/sec Loss 4.0300 LearningRate 0.0000 Epoch: 39 Global Step: 225560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:05,755-Speed 10623.05 samples/sec Loss 3.8603 LearningRate 0.0000 Epoch: 39 Global Step: 225570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:06,708-Speed 10751.83 samples/sec Loss 4.0324 LearningRate 0.0000 Epoch: 39 Global Step: 225580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:07,716-Speed 10161.45 samples/sec Loss 4.0335 LearningRate 0.0000 Epoch: 39 Global Step: 225590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:08,666-Speed 10788.56 samples/sec Loss 4.1053 LearningRate 0.0000 Epoch: 39 Global Step: 225600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:09,607-Speed 10894.59 samples/sec Loss 4.0543 LearningRate 0.0000 Epoch: 39 Global Step: 225610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:10,531-Speed 11087.77 samples/sec Loss 3.9628 LearningRate 0.0000 Epoch: 39 Global Step: 225620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:11,458-Speed 11050.40 samples/sec Loss 3.9606 LearningRate 0.0000 Epoch: 39 Global Step: 225630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:12,404-Speed 10835.57 samples/sec Loss 4.0541 LearningRate 0.0000 Epoch: 39 Global Step: 225640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:13,326-Speed 11108.35 samples/sec Loss 3.8820 LearningRate 0.0000 Epoch: 39 Global Step: 225650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:14,289-Speed 10650.11 samples/sec Loss 4.0043 LearningRate 0.0000 Epoch: 39 Global Step: 225660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:15,286-Speed 10270.94 samples/sec Loss 4.0663 LearningRate 0.0000 Epoch: 39 Global Step: 225670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:16,231-Speed 10856.59 samples/sec Loss 3.8825 LearningRate 0.0000 Epoch: 39 Global Step: 225680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:17,172-Speed 10887.13 samples/sec Loss 4.0321 LearningRate 0.0000 Epoch: 39 Global Step: 225690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:18,141-Speed 10574.50 samples/sec Loss 3.9628 LearningRate 0.0000 Epoch: 39 Global Step: 225700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:19,161-Speed 10042.35 samples/sec Loss 3.8578 LearningRate 0.0000 Epoch: 39 Global Step: 225710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:20,122-Speed 10663.32 samples/sec Loss 3.9572 LearningRate 0.0000 Epoch: 39 Global Step: 225720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:16:21,070-Speed 10814.61 samples/sec Loss 4.1120 LearningRate 0.0000 Epoch: 39 Global Step: 225730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:21,994-Speed 11080.27 samples/sec Loss 3.8899 LearningRate 0.0000 Epoch: 39 Global Step: 225740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:22,927-Speed 10990.13 samples/sec Loss 3.9664 LearningRate 0.0000 Epoch: 39 Global Step: 225750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:23,845-Speed 11159.23 samples/sec Loss 3.9648 LearningRate 0.0000 Epoch: 39 Global Step: 225760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:24,839-Speed 10312.11 samples/sec Loss 3.8947 LearningRate 0.0000 Epoch: 39 Global Step: 225770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:25,806-Speed 10597.06 samples/sec Loss 3.9190 LearningRate 0.0000 Epoch: 39 Global Step: 225780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:26,775-Speed 10573.19 samples/sec Loss 3.9501 LearningRate 0.0000 Epoch: 39 Global Step: 225790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:27,716-Speed 10887.43 samples/sec Loss 3.9537 LearningRate 0.0000 Epoch: 39 Global Step: 225800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:28,646-Speed 11022.30 samples/sec Loss 3.9981 LearningRate 0.0000 Epoch: 39 Global Step: 225810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:29,583-Speed 10935.97 samples/sec Loss 4.0174 LearningRate 0.0000 Epoch: 39 Global Step: 225820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:30,520-Speed 10935.20 samples/sec Loss 4.0161 LearningRate 0.0000 Epoch: 39 Global Step: 225830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:31,460-Speed 10907.36 samples/sec Loss 3.9059 LearningRate 0.0000 Epoch: 39 Global Step: 225840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:32,447-Speed 10379.65 samples/sec Loss 4.0702 LearningRate 0.0000 Epoch: 39 Global Step: 225850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:33,365-Speed 11153.93 samples/sec Loss 4.0253 LearningRate 0.0000 Epoch: 39 Global Step: 225860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:34,293-Speed 11044.99 samples/sec Loss 4.0138 LearningRate 0.0000 Epoch: 39 Global Step: 225870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:35,220-Speed 11059.63 samples/sec Loss 3.9958 LearningRate 0.0000 Epoch: 39 Global Step: 225880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:36,158-Speed 10924.57 samples/sec Loss 3.9993 LearningRate 0.0000 Epoch: 39 Global Step: 225890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:37,144-Speed 10389.11 samples/sec Loss 4.0248 LearningRate 0.0000 Epoch: 39 Global Step: 225900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:38,099-Speed 10736.57 samples/sec Loss 4.0161 LearningRate 0.0000 Epoch: 39 Global Step: 225910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:39,055-Speed 10718.44 samples/sec Loss 3.9777 LearningRate 0.0000 Epoch: 39 Global Step: 225920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:16:40,052-Speed 10274.32 samples/sec Loss 3.9251 LearningRate 0.0000 Epoch: 39 Global Step: 225930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:41,012-Speed 10684.93 samples/sec Loss 3.9341 LearningRate 0.0000 Epoch: 39 Global Step: 225940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:41,971-Speed 10683.30 samples/sec Loss 4.0584 LearningRate 0.0000 Epoch: 39 Global Step: 225950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:42,923-Speed 10762.24 samples/sec Loss 4.1185 LearningRate 0.0000 Epoch: 39 Global Step: 225960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:43,888-Speed 10614.38 samples/sec Loss 3.9182 LearningRate 0.0000 Epoch: 39 Global Step: 225970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:44,845-Speed 10714.46 samples/sec Loss 3.8698 LearningRate 0.0000 Epoch: 39 Global Step: 225980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:45,753-Speed 11285.31 samples/sec Loss 4.0177 LearningRate 0.0000 Epoch: 39 Global Step: 225990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:16:46,697-Speed 10856.25 samples/sec Loss 3.9916 LearningRate 0.0000 Epoch: 39 Global Step: 226000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:17:07,878-[lfw][226000]XNorm: 7.635595 Training: 2022-04-11 07:17:07,878-[lfw][226000]Accuracy-Flip: 0.99717+-0.00289 Training: 2022-04-11 07:17:07,879-[lfw][226000]Accuracy-Highest: 0.99750 Training: 2022-04-11 07:17:32,307-[cfp_fp][226000]XNorm: 6.540276 Training: 2022-04-11 07:17:32,308-[cfp_fp][226000]Accuracy-Flip: 0.95586+-0.01115 Training: 2022-04-11 07:17:32,308-[cfp_fp][226000]Accuracy-Highest: 0.95943 Training: 2022-04-11 07:17:53,429-[agedb_30][226000]XNorm: 7.491025 Training: 2022-04-11 07:17:53,429-[agedb_30][226000]Accuracy-Flip: 0.97050+-0.00730 Training: 2022-04-11 07:17:53,430-[agedb_30][226000]Accuracy-Highest: 0.97183 Training: 2022-04-11 07:17:54,411-Speed 151.23 samples/sec Loss 3.8724 LearningRate 0.0000 Epoch: 39 Global Step: 226010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:17:55,399-Speed 10366.58 samples/sec Loss 3.9585 LearningRate 0.0000 Epoch: 39 Global Step: 226020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:17:56,356-Speed 10711.74 samples/sec Loss 4.1364 LearningRate 0.0000 Epoch: 39 Global Step: 226030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:17:57,298-Speed 10874.03 samples/sec Loss 3.9485 LearningRate 0.0000 Epoch: 39 Global Step: 226040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:17:58,218-Speed 11131.23 samples/sec Loss 3.9622 LearningRate 0.0000 Epoch: 39 Global Step: 226050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:17:59,189-Speed 10560.27 samples/sec Loss 4.0537 LearningRate 0.0000 Epoch: 39 Global Step: 226060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:00,098-Speed 11264.52 samples/sec Loss 4.0147 LearningRate 0.0000 Epoch: 39 Global Step: 226070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:01,049-Speed 10772.43 samples/sec Loss 3.9077 LearningRate 0.0000 Epoch: 39 Global Step: 226080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:02,007-Speed 10701.05 samples/sec Loss 3.8957 LearningRate 0.0000 Epoch: 39 Global Step: 226090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:02,954-Speed 10825.67 samples/sec Loss 4.0909 LearningRate 0.0000 Epoch: 39 Global Step: 226100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:03,926-Speed 10539.38 samples/sec Loss 3.9138 LearningRate 0.0000 Epoch: 39 Global Step: 226110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:04,883-Speed 10701.97 samples/sec Loss 4.1166 LearningRate 0.0000 Epoch: 39 Global Step: 226120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:05,799-Speed 11190.44 samples/sec Loss 3.9962 LearningRate 0.0000 Epoch: 39 Global Step: 226130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:06,781-Speed 10433.73 samples/sec Loss 3.9118 LearningRate 0.0000 Epoch: 39 Global Step: 226140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:07,724-Speed 10872.27 samples/sec Loss 3.9697 LearningRate 0.0000 Epoch: 39 Global Step: 226150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:08,674-Speed 10790.74 samples/sec Loss 3.8750 LearningRate 0.0000 Epoch: 39 Global Step: 226160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:09,578-Speed 11331.03 samples/sec Loss 3.9637 LearningRate 0.0000 Epoch: 39 Global Step: 226170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:10,489-Speed 11244.08 samples/sec Loss 4.0265 LearningRate 0.0000 Epoch: 39 Global Step: 226180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:11,420-Speed 11013.78 samples/sec Loss 4.0183 LearningRate 0.0000 Epoch: 39 Global Step: 226190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:12,372-Speed 10756.16 samples/sec Loss 4.0750 LearningRate 0.0000 Epoch: 39 Global Step: 226200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:13,392-Speed 10054.06 samples/sec Loss 4.0273 LearningRate 0.0000 Epoch: 39 Global Step: 226210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:14,415-Speed 10011.39 samples/sec Loss 3.9520 LearningRate 0.0000 Epoch: 39 Global Step: 226220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:15,426-Speed 10139.88 samples/sec Loss 3.9995 LearningRate 0.0000 Epoch: 39 Global Step: 226230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:18:16,419-Speed 10319.64 samples/sec Loss 4.0240 LearningRate 0.0000 Epoch: 39 Global Step: 226240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:18:17,379-Speed 10673.29 samples/sec Loss 3.8922 LearningRate 0.0000 Epoch: 39 Global Step: 226250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:18,335-Speed 10728.00 samples/sec Loss 3.8736 LearningRate 0.0000 Epoch: 39 Global Step: 226260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:19,333-Speed 10274.40 samples/sec Loss 4.1014 LearningRate 0.0000 Epoch: 39 Global Step: 226270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:20,325-Speed 10329.20 samples/sec Loss 3.9453 LearningRate 0.0000 Epoch: 39 Global Step: 226280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:21,225-Speed 11377.57 samples/sec Loss 4.0706 LearningRate 0.0000 Epoch: 39 Global Step: 226290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:22,142-Speed 11178.77 samples/sec Loss 4.0385 LearningRate 0.0000 Epoch: 39 Global Step: 226300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:23,095-Speed 10747.32 samples/sec Loss 4.0063 LearningRate 0.0000 Epoch: 39 Global Step: 226310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:24,062-Speed 10603.25 samples/sec Loss 3.9512 LearningRate 0.0000 Epoch: 39 Global Step: 226320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:24,999-Speed 10936.01 samples/sec Loss 3.9628 LearningRate 0.0000 Epoch: 39 Global Step: 226330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:25,965-Speed 10609.32 samples/sec Loss 4.0374 LearningRate 0.0000 Epoch: 39 Global Step: 226340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:26,894-Speed 11022.86 samples/sec Loss 3.9455 LearningRate 0.0000 Epoch: 39 Global Step: 226350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:18:27,832-Speed 10927.54 samples/sec Loss 3.9772 LearningRate 0.0000 Epoch: 39 Global Step: 226360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:18:28,779-Speed 10816.84 samples/sec Loss 3.9220 LearningRate 0.0000 Epoch: 39 Global Step: 226370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:18:29,790-Speed 10135.39 samples/sec Loss 3.8538 LearningRate 0.0000 Epoch: 39 Global Step: 226380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:30,720-Speed 11023.37 samples/sec Loss 3.9032 LearningRate 0.0000 Epoch: 39 Global Step: 226390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:31,696-Speed 10498.21 samples/sec Loss 4.1285 LearningRate 0.0000 Epoch: 39 Global Step: 226400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:32,628-Speed 11000.38 samples/sec Loss 3.9948 LearningRate 0.0000 Epoch: 39 Global Step: 226410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:33,551-Speed 11096.55 samples/sec Loss 3.8085 LearningRate 0.0000 Epoch: 39 Global Step: 226420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:34,466-Speed 11205.98 samples/sec Loss 3.9916 LearningRate 0.0000 Epoch: 39 Global Step: 226430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:35,384-Speed 11157.33 samples/sec Loss 3.9351 LearningRate 0.0000 Epoch: 39 Global Step: 226440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:36,328-Speed 10858.22 samples/sec Loss 3.8845 LearningRate 0.0000 Epoch: 39 Global Step: 226450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:37,255-Speed 11052.30 samples/sec Loss 4.0475 LearningRate 0.0000 Epoch: 39 Global Step: 226460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:38,165-Speed 11256.61 samples/sec Loss 3.9046 LearningRate 0.0000 Epoch: 39 Global Step: 226470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:39,130-Speed 10629.38 samples/sec Loss 3.9294 LearningRate 0.0000 Epoch: 39 Global Step: 226480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:40,125-Speed 10295.67 samples/sec Loss 4.0824 LearningRate 0.0000 Epoch: 39 Global Step: 226490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:41,136-Speed 10139.14 samples/sec Loss 3.9262 LearningRate 0.0000 Epoch: 39 Global Step: 226500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:42,100-Speed 10630.86 samples/sec Loss 3.8873 LearningRate 0.0000 Epoch: 39 Global Step: 226510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:43,039-Speed 10908.27 samples/sec Loss 3.9807 LearningRate 0.0000 Epoch: 39 Global Step: 226520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:44,042-Speed 10220.10 samples/sec Loss 4.1023 LearningRate 0.0000 Epoch: 39 Global Step: 226530 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:44,958-Speed 11180.30 samples/sec Loss 4.0401 LearningRate 0.0000 Epoch: 39 Global Step: 226540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:45,873-Speed 11202.64 samples/sec Loss 4.0461 LearningRate 0.0000 Epoch: 39 Global Step: 226550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:46,799-Speed 11063.28 samples/sec Loss 3.8965 LearningRate 0.0000 Epoch: 39 Global Step: 226560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:47,784-Speed 10407.90 samples/sec Loss 3.8570 LearningRate 0.0000 Epoch: 39 Global Step: 226570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:48,764-Speed 10456.47 samples/sec Loss 3.9569 LearningRate 0.0000 Epoch: 39 Global Step: 226580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:49,734-Speed 10562.37 samples/sec Loss 3.9156 LearningRate 0.0000 Epoch: 39 Global Step: 226590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:50,660-Speed 11063.79 samples/sec Loss 4.0111 LearningRate 0.0000 Epoch: 39 Global Step: 226600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:51,619-Speed 10685.68 samples/sec Loss 4.0655 LearningRate 0.0000 Epoch: 39 Global Step: 226610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:52,546-Speed 11054.57 samples/sec Loss 4.0500 LearningRate 0.0000 Epoch: 39 Global Step: 226620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:53,495-Speed 10796.97 samples/sec Loss 4.0245 LearningRate 0.0000 Epoch: 39 Global Step: 226630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:54,438-Speed 10864.31 samples/sec Loss 4.0252 LearningRate 0.0000 Epoch: 39 Global Step: 226640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:55,399-Speed 10665.80 samples/sec Loss 3.9921 LearningRate 0.0000 Epoch: 39 Global Step: 226650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:56,314-Speed 11195.21 samples/sec Loss 3.9170 LearningRate 0.0000 Epoch: 39 Global Step: 226660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:57,236-Speed 11113.29 samples/sec Loss 3.9765 LearningRate 0.0000 Epoch: 39 Global Step: 226670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:18:58,192-Speed 10722.30 samples/sec Loss 3.9926 LearningRate 0.0000 Epoch: 39 Global Step: 226680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:18:59,182-Speed 10357.05 samples/sec Loss 3.9593 LearningRate 0.0000 Epoch: 39 Global Step: 226690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:00,141-Speed 10688.19 samples/sec Loss 4.0363 LearningRate 0.0000 Epoch: 39 Global Step: 226700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:01,077-Speed 10939.49 samples/sec Loss 3.9503 LearningRate 0.0000 Epoch: 39 Global Step: 226710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:02,009-Speed 11004.48 samples/sec Loss 4.0677 LearningRate 0.0000 Epoch: 39 Global Step: 226720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:02,942-Speed 10983.56 samples/sec Loss 3.8881 LearningRate 0.0000 Epoch: 39 Global Step: 226730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:03,883-Speed 10889.22 samples/sec Loss 3.9953 LearningRate 0.0000 Epoch: 39 Global Step: 226740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:04,851-Speed 10579.84 samples/sec Loss 3.9737 LearningRate 0.0000 Epoch: 39 Global Step: 226750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:05,837-Speed 10397.67 samples/sec Loss 3.9754 LearningRate 0.0000 Epoch: 39 Global Step: 226760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:06,808-Speed 10547.05 samples/sec Loss 4.0546 LearningRate 0.0000 Epoch: 39 Global Step: 226770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:07,775-Speed 10597.37 samples/sec Loss 3.9463 LearningRate 0.0000 Epoch: 39 Global Step: 226780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:08,732-Speed 10716.42 samples/sec Loss 4.0722 LearningRate 0.0000 Epoch: 39 Global Step: 226790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:09,667-Speed 10959.14 samples/sec Loss 4.0498 LearningRate 0.0000 Epoch: 39 Global Step: 226800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:10,599-Speed 10989.87 samples/sec Loss 4.0380 LearningRate 0.0000 Epoch: 39 Global Step: 226810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:11,544-Speed 10842.27 samples/sec Loss 3.8035 LearningRate 0.0000 Epoch: 39 Global Step: 226820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:12,502-Speed 10697.65 samples/sec Loss 3.9835 LearningRate 0.0000 Epoch: 39 Global Step: 226830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:13,501-Speed 10258.60 samples/sec Loss 4.0469 LearningRate 0.0000 Epoch: 39 Global Step: 226840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:14,426-Speed 11080.71 samples/sec Loss 4.1307 LearningRate 0.0000 Epoch: 39 Global Step: 226850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:15,383-Speed 10705.53 samples/sec Loss 3.9850 LearningRate 0.0000 Epoch: 39 Global Step: 226860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:16,343-Speed 10672.84 samples/sec Loss 3.9405 LearningRate 0.0000 Epoch: 39 Global Step: 226870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:17,280-Speed 10933.02 samples/sec Loss 3.8797 LearningRate 0.0000 Epoch: 39 Global Step: 226880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:18,216-Speed 10948.75 samples/sec Loss 3.9471 LearningRate 0.0000 Epoch: 39 Global Step: 226890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:19,203-Speed 10380.73 samples/sec Loss 3.9102 LearningRate 0.0000 Epoch: 39 Global Step: 226900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:20,143-Speed 10902.57 samples/sec Loss 4.0585 LearningRate 0.0000 Epoch: 39 Global Step: 226910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:21,089-Speed 10828.96 samples/sec Loss 4.0467 LearningRate 0.0000 Epoch: 39 Global Step: 226920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:21,998-Speed 11277.53 samples/sec Loss 4.0300 LearningRate 0.0000 Epoch: 39 Global Step: 226930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:22,916-Speed 11169.76 samples/sec Loss 4.0173 LearningRate 0.0000 Epoch: 39 Global Step: 226940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:23,879-Speed 10638.10 samples/sec Loss 3.9445 LearningRate 0.0000 Epoch: 39 Global Step: 226950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:24,838-Speed 10682.07 samples/sec Loss 4.0298 LearningRate 0.0000 Epoch: 39 Global Step: 226960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:25,795-Speed 10714.11 samples/sec Loss 4.0072 LearningRate 0.0000 Epoch: 39 Global Step: 226970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:26,759-Speed 10621.06 samples/sec Loss 3.8874 LearningRate 0.0000 Epoch: 39 Global Step: 226980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:27,763-Speed 10242.21 samples/sec Loss 3.9693 LearningRate 0.0000 Epoch: 39 Global Step: 226990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:28,741-Speed 10484.49 samples/sec Loss 3.9757 LearningRate 0.0000 Epoch: 39 Global Step: 227000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:29,707-Speed 10606.48 samples/sec Loss 3.8591 LearningRate 0.0000 Epoch: 39 Global Step: 227010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:30,684-Speed 10478.06 samples/sec Loss 3.8691 LearningRate 0.0000 Epoch: 39 Global Step: 227020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:31,681-Speed 10286.80 samples/sec Loss 3.8929 LearningRate 0.0000 Epoch: 39 Global Step: 227030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:32,640-Speed 10686.27 samples/sec Loss 3.9892 LearningRate 0.0000 Epoch: 39 Global Step: 227040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:33,572-Speed 10993.23 samples/sec Loss 3.9369 LearningRate 0.0000 Epoch: 39 Global Step: 227050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:34,498-Speed 11064.25 samples/sec Loss 4.1315 LearningRate 0.0000 Epoch: 39 Global Step: 227060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:35,410-Speed 11232.60 samples/sec Loss 4.0778 LearningRate 0.0000 Epoch: 39 Global Step: 227070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:36,335-Speed 11074.16 samples/sec Loss 4.0535 LearningRate 0.0000 Epoch: 39 Global Step: 227080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:37,267-Speed 10994.27 samples/sec Loss 3.9748 LearningRate 0.0000 Epoch: 39 Global Step: 227090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:38,215-Speed 10808.92 samples/sec Loss 3.9268 LearningRate 0.0000 Epoch: 39 Global Step: 227100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:39,182-Speed 10596.73 samples/sec Loss 3.9013 LearningRate 0.0000 Epoch: 39 Global Step: 227110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:40,100-Speed 11160.47 samples/sec Loss 3.9866 LearningRate 0.0000 Epoch: 39 Global Step: 227120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-11 07:19:41,072-Speed 10550.39 samples/sec Loss 4.0497 LearningRate 0.0000 Epoch: 39 Global Step: 227130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:42,070-Speed 10260.44 samples/sec Loss 4.0153 LearningRate 0.0000 Epoch: 39 Global Step: 227140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:43,030-Speed 10683.51 samples/sec Loss 3.9331 LearningRate 0.0000 Epoch: 39 Global Step: 227150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:43,962-Speed 10994.16 samples/sec Loss 3.9051 LearningRate 0.0000 Epoch: 39 Global Step: 227160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:44,888-Speed 11069.28 samples/sec Loss 4.0188 LearningRate 0.0000 Epoch: 39 Global Step: 227170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:45,798-Speed 11256.65 samples/sec Loss 4.0146 LearningRate 0.0000 Epoch: 39 Global Step: 227180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:46,708-Speed 11259.11 samples/sec Loss 3.9900 LearningRate 0.0000 Epoch: 39 Global Step: 227190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:47,678-Speed 10561.89 samples/sec Loss 3.8009 LearningRate 0.0000 Epoch: 39 Global Step: 227200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:48,664-Speed 10399.89 samples/sec Loss 4.0412 LearningRate 0.0000 Epoch: 39 Global Step: 227210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:49,599-Speed 10955.19 samples/sec Loss 3.9939 LearningRate 0.0000 Epoch: 39 Global Step: 227220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:50,527-Speed 11046.83 samples/sec Loss 4.0371 LearningRate 0.0000 Epoch: 39 Global Step: 227230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:51,489-Speed 10645.82 samples/sec Loss 4.0230 LearningRate 0.0000 Epoch: 39 Global Step: 227240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:52,449-Speed 10671.84 samples/sec Loss 3.9077 LearningRate 0.0000 Epoch: 39 Global Step: 227250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:53,421-Speed 10544.79 samples/sec Loss 3.9125 LearningRate 0.0000 Epoch: 39 Global Step: 227260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:54,416-Speed 10296.94 samples/sec Loss 3.9655 LearningRate 0.0000 Epoch: 39 Global Step: 227270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 07:19:55,388-Speed 10541.11 samples/sec Loss 4.0103 LearningRate 0.0000 Epoch: 39 Global Step: 227280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:56,330-Speed 10887.43 samples/sec Loss 3.9507 LearningRate 0.0000 Epoch: 39 Global Step: 227290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:57,265-Speed 10948.46 samples/sec Loss 3.9611 LearningRate 0.0000 Epoch: 39 Global Step: 227300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:58,170-Speed 11325.79 samples/sec Loss 3.9606 LearningRate 0.0000 Epoch: 39 Global Step: 227310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:19:59,089-Speed 11155.62 samples/sec Loss 3.9916 LearningRate 0.0000 Epoch: 39 Global Step: 227320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:00,024-Speed 10961.43 samples/sec Loss 4.0012 LearningRate 0.0000 Epoch: 39 Global Step: 227330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:00,975-Speed 10768.40 samples/sec Loss 3.8979 LearningRate 0.0000 Epoch: 39 Global Step: 227340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:01,970-Speed 10296.58 samples/sec Loss 3.7786 LearningRate 0.0000 Epoch: 39 Global Step: 227350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:02,931-Speed 10665.12 samples/sec Loss 3.9432 LearningRate 0.0000 Epoch: 39 Global Step: 227360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:03,884-Speed 10750.87 samples/sec Loss 4.0243 LearningRate 0.0000 Epoch: 39 Global Step: 227370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:04,855-Speed 10554.28 samples/sec Loss 3.9526 LearningRate 0.0000 Epoch: 39 Global Step: 227380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:05,802-Speed 10824.47 samples/sec Loss 3.9304 LearningRate 0.0000 Epoch: 39 Global Step: 227390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:06,783-Speed 10448.25 samples/sec Loss 3.9541 LearningRate 0.0000 Epoch: 39 Global Step: 227400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:07,742-Speed 10699.32 samples/sec Loss 4.0379 LearningRate 0.0000 Epoch: 39 Global Step: 227410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:08,680-Speed 10922.00 samples/sec Loss 4.1253 LearningRate 0.0000 Epoch: 39 Global Step: 227420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:09,693-Speed 10115.34 samples/sec Loss 3.9495 LearningRate 0.0000 Epoch: 39 Global Step: 227430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 07:20:10,568-Speed 11715.09 samples/sec Loss 4.0947 LearningRate 0.0000 Epoch: 39 Global Step: 227440 Fp16 Grad Scale: 32768 Required: -0 hours