Training: 2021-03-16 21:21:18,084-rank_id: 0 Training: 2021-03-16 21:21:27,234-softmax weight init successfully! Training: 2021-03-16 21:21:27,234-softmax weight mom init successfully! Training: 2021-03-16 21:21:27,236-Total Step is: 124550 Training: 2021-03-16 21:22:07,760-Reducer buckets have been rebuilt in this iteration. Training: 2021-03-16 21:22:29,308-Speed 5033.71 samples/sec Loss 54.3514 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 256 Required: 9 hours Training: 2021-03-16 21:22:39,905-Speed 4831.76 samples/sec Loss 52.1281 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 256 Required: 8 hours Training: 2021-03-16 21:22:50,470-Speed 4846.74 samples/sec Loss 49.5487 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 512 Required: 8 hours Training: 2021-03-16 21:23:00,893-Speed 4912.66 samples/sec Loss 47.5880 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 512 Required: 8 hours Training: 2021-03-16 21:23:11,367-Speed 4888.52 samples/sec Loss 46.2347 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 1024 Required: 8 hours Training: 2021-03-16 21:23:21,803-Speed 4906.33 samples/sec Loss 45.3044 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 1024 Required: 8 hours Training: 2021-03-16 21:23:33,450-Speed 4396.29 samples/sec Loss 44.6127 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2021-03-16 21:23:44,965-Speed 4446.83 samples/sec Loss 44.0824 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 2048 Required: 8 hours Training: 2021-03-16 21:23:55,352-Speed 4929.59 samples/sec Loss 43.4881 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2021-03-16 21:24:05,787-Speed 4907.15 samples/sec Loss 42.9178 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 4096 Required: 8 hours Training: 2021-03-16 21:24:16,435-Speed 4808.55 samples/sec Loss 42.2825 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2021-03-16 21:24:26,653-Speed 5011.02 samples/sec Loss 41.7142 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2021-03-16 21:24:37,047-Speed 4926.12 samples/sec Loss 41.1308 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:24:47,529-Speed 4884.93 samples/sec Loss 40.4047 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:24:57,972-Speed 4903.40 samples/sec Loss 39.7187 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:25:08,645-Speed 4797.33 samples/sec Loss 39.0050 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:25:18,858-Speed 5013.44 samples/sec Loss 38.1732 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:25:29,638-Speed 4750.14 samples/sec Loss 37.3293 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:25:40,027-Speed 4928.39 samples/sec Loss 36.4597 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:25:50,574-Speed 4854.65 samples/sec Loss 35.5797 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:26:01,169-Speed 4833.05 samples/sec Loss 34.6252 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:26:11,315-Speed 5046.22 samples/sec Loss 33.6853 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:26:21,757-Speed 4903.57 samples/sec Loss 32.7605 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:26:32,178-Speed 4913.46 samples/sec Loss 31.6489 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:26:42,864-Speed 4791.48 samples/sec Loss 30.8193 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:26:53,172-Speed 4967.29 samples/sec Loss 29.9436 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:27:03,638-Speed 4892.16 samples/sec Loss 29.0964 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:27:14,067-Speed 4909.79 samples/sec Loss 28.1978 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:27:25,145-Speed 4622.03 samples/sec Loss 27.4253 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:27:36,519-Speed 4501.77 samples/sec Loss 26.6739 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:27:47,191-Speed 4797.92 samples/sec Loss 25.9305 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:27:57,431-Speed 5000.12 samples/sec Loss 25.2926 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:28:07,896-Speed 4892.89 samples/sec Loss 24.8205 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:28:18,159-Speed 4989.01 samples/sec Loss 24.3170 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:28:28,516-Speed 4944.17 samples/sec Loss 23.8075 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:28:38,921-Speed 4921.11 samples/sec Loss 23.2387 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:28:49,369-Speed 4900.64 samples/sec Loss 22.9779 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:29:00,024-Speed 4805.48 samples/sec Loss 22.5293 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:29:10,825-Speed 4740.59 samples/sec Loss 22.1405 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 21:29:35,199-[lfw][2000]XNorm: 22.504702 Training: 2021-03-16 21:29:35,200-[lfw][2000]Accuracy-Flip: 0.98167+-0.00459 Training: 2021-03-16 21:29:35,200-[lfw][2000]Accuracy-Highest: 0.98167 Training: 2021-03-16 21:30:03,542-[cfp_fp][2000]XNorm: 18.649293 Training: 2021-03-16 21:30:03,543-[cfp_fp][2000]Accuracy-Flip: 0.86614+-0.01121 Training: 2021-03-16 21:30:03,543-[cfp_fp][2000]Accuracy-Highest: 0.86614 Training: 2021-03-16 21:30:26,960-[agedb_30][2000]XNorm: 21.834648 Training: 2021-03-16 21:30:26,960-[agedb_30][2000]Accuracy-Flip: 0.88050+-0.02369 Training: 2021-03-16 21:30:26,960-[agedb_30][2000]Accuracy-Highest: 0.88050 Training: 2021-03-16 21:30:37,353-Speed 591.72 samples/sec Loss 21.8792 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2021-03-16 21:30:47,654-Speed 4970.69 samples/sec Loss 21.4880 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:30:58,031-Speed 4934.44 samples/sec Loss 21.3458 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:31:08,177-Speed 5046.59 samples/sec Loss 21.0168 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:31:18,725-Speed 4853.99 samples/sec Loss 20.7925 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:31:29,251-Speed 4864.38 samples/sec Loss 20.5297 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:31:39,556-Speed 4968.82 samples/sec Loss 20.2754 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:31:49,982-Speed 4911.11 samples/sec Loss 20.1257 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:32:00,315-Speed 4955.10 samples/sec Loss 19.9417 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:32:10,696-Speed 4932.63 samples/sec Loss 19.8338 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:32:21,247-Speed 4852.56 samples/sec Loss 19.6824 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:32:31,674-Speed 4910.71 samples/sec Loss 19.3927 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:32:42,159-Speed 4883.52 samples/sec Loss 19.2140 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:32:52,643-Speed 4883.59 samples/sec Loss 19.1376 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:33:02,959-Speed 4963.59 samples/sec Loss 18.9448 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:33:13,460-Speed 4876.14 samples/sec Loss 18.9667 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:33:23,924-Speed 4893.21 samples/sec Loss 18.8479 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:33:34,492-Speed 4844.90 samples/sec Loss 18.6346 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:33:44,791-Speed 4971.57 samples/sec Loss 18.5219 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:33:55,588-Speed 4742.35 samples/sec Loss 18.5011 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:34:05,850-Speed 4989.62 samples/sec Loss 18.3966 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:34:17,339-Speed 4456.88 samples/sec Loss 18.3951 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:34:28,381-Speed 4636.90 samples/sec Loss 18.1197 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:34:38,933-Speed 4852.19 samples/sec Loss 18.1402 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:34:49,586-Speed 4806.37 samples/sec Loss 18.0091 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:34:59,870-Speed 4978.80 samples/sec Loss 17.8835 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:35:10,290-Speed 4914.12 samples/sec Loss 17.8960 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:35:20,578-Speed 4976.83 samples/sec Loss 17.7531 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:35:31,159-Speed 4838.82 samples/sec Loss 17.7166 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:35:41,548-Speed 4928.75 samples/sec Loss 17.6421 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:35:52,209-Speed 4802.53 samples/sec Loss 17.5504 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:36:02,619-Speed 4918.82 samples/sec Loss 17.5236 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:36:12,902-Speed 4979.37 samples/sec Loss 17.4163 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:36:23,201-Speed 4971.41 samples/sec Loss 17.3597 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:36:33,628-Speed 4910.80 samples/sec Loss 17.4335 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:36:44,182-Speed 4851.62 samples/sec Loss 17.2464 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:36:54,773-Speed 4834.51 samples/sec Loss 17.2081 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:37:05,211-Speed 4905.31 samples/sec Loss 17.1451 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:37:15,670-Speed 4895.54 samples/sec Loss 17.0580 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:37:26,152-Speed 4885.22 samples/sec Loss 16.9921 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:37:49,634-[lfw][4000]XNorm: 21.755183 Training: 2021-03-16 21:37:49,634-[lfw][4000]Accuracy-Flip: 0.99050+-0.00522 Training: 2021-03-16 21:37:49,635-[lfw][4000]Accuracy-Highest: 0.99050 Training: 2021-03-16 21:38:16,700-[cfp_fp][4000]XNorm: 18.302679 Training: 2021-03-16 21:38:16,700-[cfp_fp][4000]Accuracy-Flip: 0.91686+-0.01202 Training: 2021-03-16 21:38:16,700-[cfp_fp][4000]Accuracy-Highest: 0.91686 Training: 2021-03-16 21:38:40,158-[agedb_30][4000]XNorm: 20.901162 Training: 2021-03-16 21:38:40,158-[agedb_30][4000]Accuracy-Flip: 0.93717+-0.01167 Training: 2021-03-16 21:38:40,158-[agedb_30][4000]Accuracy-Highest: 0.93717 Training: 2021-03-16 21:38:50,502-Speed 606.99 samples/sec Loss 17.0121 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:39:01,361-Speed 4715.39 samples/sec Loss 17.0393 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:39:11,978-Speed 4822.52 samples/sec Loss 16.9870 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:39:22,448-Speed 4890.47 samples/sec Loss 16.9207 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:39:32,836-Speed 4929.12 samples/sec Loss 17.0185 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:39:43,243-Speed 4920.11 samples/sec Loss 16.8954 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:39:53,642-Speed 4923.67 samples/sec Loss 16.7599 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:40:04,175-Speed 4861.25 samples/sec Loss 16.7620 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:40:14,456-Speed 4980.28 samples/sec Loss 16.7601 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:40:24,923-Speed 4891.77 samples/sec Loss 16.7172 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:40:35,212-Speed 4976.24 samples/sec Loss 16.6593 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:40:45,880-Speed 4799.84 samples/sec Loss 16.6363 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:40:56,256-Speed 4934.86 samples/sec Loss 16.5620 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:41:06,955-Speed 4785.70 samples/sec Loss 16.5974 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:41:17,526-Speed 4843.85 samples/sec Loss 16.5480 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:41:27,986-Speed 4895.06 samples/sec Loss 16.4908 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:41:38,556-Speed 4844.48 samples/sec Loss 16.5321 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:41:49,490-Speed 4682.89 samples/sec Loss 16.4953 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:42:00,352-Speed 4713.61 samples/sec Loss 16.4102 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:42:12,314-Speed 4280.67 samples/sec Loss 16.1093 Epoch: 1 Global Step: 5000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:42:22,937-Speed 4819.71 samples/sec Loss 15.6786 Epoch: 1 Global Step: 5050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:42:33,380-Speed 4903.42 samples/sec Loss 15.7333 Epoch: 1 Global Step: 5100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:42:44,132-Speed 4761.99 samples/sec Loss 15.7926 Epoch: 1 Global Step: 5150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:42:54,757-Speed 4819.36 samples/sec Loss 15.7706 Epoch: 1 Global Step: 5200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:43:05,481-Speed 4774.88 samples/sec Loss 15.9307 Epoch: 1 Global Step: 5250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:43:15,922-Speed 4903.78 samples/sec Loss 15.9080 Epoch: 1 Global Step: 5300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:43:26,170-Speed 4996.43 samples/sec Loss 15.9995 Epoch: 1 Global Step: 5350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:43:36,548-Speed 4933.68 samples/sec Loss 15.9998 Epoch: 1 Global Step: 5400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:43:47,056-Speed 4872.75 samples/sec Loss 16.0706 Epoch: 1 Global Step: 5450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:43:57,805-Speed 4763.80 samples/sec Loss 15.9627 Epoch: 1 Global Step: 5500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:44:08,274-Speed 4890.62 samples/sec Loss 16.0330 Epoch: 1 Global Step: 5550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:44:18,839-Speed 4846.78 samples/sec Loss 16.0131 Epoch: 1 Global Step: 5600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:44:29,454-Speed 4823.49 samples/sec Loss 15.9639 Epoch: 1 Global Step: 5650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:44:39,857-Speed 4922.09 samples/sec Loss 15.9728 Epoch: 1 Global Step: 5700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:44:50,335-Speed 4886.53 samples/sec Loss 15.9926 Epoch: 1 Global Step: 5750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:45:00,880-Speed 4855.69 samples/sec Loss 15.9379 Epoch: 1 Global Step: 5800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:45:11,360-Speed 4885.46 samples/sec Loss 15.8547 Epoch: 1 Global Step: 5850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:45:21,843-Speed 4884.24 samples/sec Loss 15.9538 Epoch: 1 Global Step: 5900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:45:32,287-Speed 4902.77 samples/sec Loss 15.8307 Epoch: 1 Global Step: 5950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:45:42,660-Speed 4935.94 samples/sec Loss 15.9777 Epoch: 1 Global Step: 6000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:46:05,997-[lfw][6000]XNorm: 23.031200 Training: 2021-03-16 21:46:05,997-[lfw][6000]Accuracy-Flip: 0.99133+-0.00379 Training: 2021-03-16 21:46:05,997-[lfw][6000]Accuracy-Highest: 0.99133 Training: 2021-03-16 21:46:32,979-[cfp_fp][6000]XNorm: 19.594706 Training: 2021-03-16 21:46:32,979-[cfp_fp][6000]Accuracy-Flip: 0.92729+-0.01288 Training: 2021-03-16 21:46:32,979-[cfp_fp][6000]Accuracy-Highest: 0.92729 Training: 2021-03-16 21:46:56,312-[agedb_30][6000]XNorm: 22.514729 Training: 2021-03-16 21:46:56,313-[agedb_30][6000]Accuracy-Flip: 0.93117+-0.01190 Training: 2021-03-16 21:46:56,313-[agedb_30][6000]Accuracy-Highest: 0.93717 Training: 2021-03-16 21:47:06,550-Speed 610.33 samples/sec Loss 15.8321 Epoch: 1 Global Step: 6050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:47:16,854-Speed 4969.08 samples/sec Loss 15.9017 Epoch: 1 Global Step: 6100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:47:27,712-Speed 4715.56 samples/sec Loss 15.8759 Epoch: 1 Global Step: 6150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:47:37,991-Speed 4981.55 samples/sec Loss 15.8655 Epoch: 1 Global Step: 6200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:47:48,597-Speed 4827.35 samples/sec Loss 15.7376 Epoch: 1 Global Step: 6250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:47:59,076-Speed 4886.47 samples/sec Loss 15.7734 Epoch: 1 Global Step: 6300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:48:09,508-Speed 4908.08 samples/sec Loss 15.7927 Epoch: 1 Global Step: 6350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:48:20,135-Speed 4817.96 samples/sec Loss 15.7359 Epoch: 1 Global Step: 6400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:48:30,670-Speed 4860.57 samples/sec Loss 15.6945 Epoch: 1 Global Step: 6450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:48:41,054-Speed 4930.92 samples/sec Loss 15.7353 Epoch: 1 Global Step: 6500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:48:51,615-Speed 4848.14 samples/sec Loss 15.7358 Epoch: 1 Global Step: 6550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:49:02,128-Speed 4870.20 samples/sec Loss 15.6717 Epoch: 1 Global Step: 6600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:49:12,634-Speed 4873.67 samples/sec Loss 15.7343 Epoch: 1 Global Step: 6650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:49:23,441-Speed 4737.90 samples/sec Loss 15.6679 Epoch: 1 Global Step: 6700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:49:33,968-Speed 4864.16 samples/sec Loss 15.6678 Epoch: 1 Global Step: 6750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:49:44,420-Speed 4898.74 samples/sec Loss 15.6745 Epoch: 1 Global Step: 6800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:49:55,067-Speed 4808.90 samples/sec Loss 15.5179 Epoch: 1 Global Step: 6850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:50:05,694-Speed 4818.59 samples/sec Loss 15.5038 Epoch: 1 Global Step: 6900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:50:16,419-Speed 4773.84 samples/sec Loss 15.5447 Epoch: 1 Global Step: 6950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:50:27,052-Speed 4815.87 samples/sec Loss 15.5773 Epoch: 1 Global Step: 7000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:50:37,603-Speed 4852.66 samples/sec Loss 15.5632 Epoch: 1 Global Step: 7050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:50:48,447-Speed 4721.59 samples/sec Loss 15.5731 Epoch: 1 Global Step: 7100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:50:58,827-Speed 4932.79 samples/sec Loss 15.4183 Epoch: 1 Global Step: 7150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:51:09,239-Speed 4917.94 samples/sec Loss 15.5815 Epoch: 1 Global Step: 7200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:51:19,661-Speed 4912.54 samples/sec Loss 15.5238 Epoch: 1 Global Step: 7250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:51:30,096-Speed 4907.06 samples/sec Loss 15.4079 Epoch: 1 Global Step: 7300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:51:40,582-Speed 4882.94 samples/sec Loss 15.5199 Epoch: 1 Global Step: 7350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:51:51,265-Speed 4793.07 samples/sec Loss 15.3767 Epoch: 1 Global Step: 7400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:52:01,859-Speed 4832.79 samples/sec Loss 15.3334 Epoch: 1 Global Step: 7450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:52:12,432-Speed 4842.78 samples/sec Loss 15.3748 Epoch: 1 Global Step: 7500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:52:22,939-Speed 4873.16 samples/sec Loss 15.4994 Epoch: 1 Global Step: 7550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:52:33,614-Speed 4796.81 samples/sec Loss 15.2720 Epoch: 1 Global Step: 7600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:52:44,162-Speed 4853.80 samples/sec Loss 15.2829 Epoch: 1 Global Step: 7650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:52:54,537-Speed 4935.38 samples/sec Loss 15.3394 Epoch: 1 Global Step: 7700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:53:05,069-Speed 4861.43 samples/sec Loss 15.3267 Epoch: 1 Global Step: 7750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:53:15,486-Speed 4915.50 samples/sec Loss 15.3645 Epoch: 1 Global Step: 7800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:53:26,055-Speed 4844.63 samples/sec Loss 15.3644 Epoch: 1 Global Step: 7850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:53:36,448-Speed 4926.36 samples/sec Loss 15.3994 Epoch: 1 Global Step: 7900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:53:46,877-Speed 4909.85 samples/sec Loss 15.3585 Epoch: 1 Global Step: 7950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:53:57,276-Speed 4923.53 samples/sec Loss 15.3964 Epoch: 1 Global Step: 8000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:54:20,834-[lfw][8000]XNorm: 23.091563 Training: 2021-03-16 21:54:20,834-[lfw][8000]Accuracy-Flip: 0.99367+-0.00420 Training: 2021-03-16 21:54:20,834-[lfw][8000]Accuracy-Highest: 0.99367 Training: 2021-03-16 21:54:48,005-[cfp_fp][8000]XNorm: 19.333232 Training: 2021-03-16 21:54:48,006-[cfp_fp][8000]Accuracy-Flip: 0.92914+-0.01348 Training: 2021-03-16 21:54:48,006-[cfp_fp][8000]Accuracy-Highest: 0.92914 Training: 2021-03-16 21:55:11,254-[agedb_30][8000]XNorm: 21.994820 Training: 2021-03-16 21:55:11,255-[agedb_30][8000]Accuracy-Flip: 0.94767+-0.01446 Training: 2021-03-16 21:55:11,255-[agedb_30][8000]Accuracy-Highest: 0.94767 Training: 2021-03-16 21:55:21,599-Speed 607.19 samples/sec Loss 15.2142 Epoch: 1 Global Step: 8050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:55:32,035-Speed 4906.30 samples/sec Loss 15.3125 Epoch: 1 Global Step: 8100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:55:42,560-Speed 4864.99 samples/sec Loss 15.2154 Epoch: 1 Global Step: 8150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:55:52,889-Speed 4957.17 samples/sec Loss 15.3027 Epoch: 1 Global Step: 8200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:56:03,652-Speed 4757.34 samples/sec Loss 15.2654 Epoch: 1 Global Step: 8250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:56:14,177-Speed 4864.81 samples/sec Loss 15.2379 Epoch: 1 Global Step: 8300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:56:24,597-Speed 4913.90 samples/sec Loss 15.2464 Epoch: 1 Global Step: 8350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:56:35,403-Speed 4738.26 samples/sec Loss 15.0790 Epoch: 1 Global Step: 8400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:56:45,824-Speed 4913.76 samples/sec Loss 15.2505 Epoch: 1 Global Step: 8450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:56:56,333-Speed 4871.90 samples/sec Loss 15.2261 Epoch: 1 Global Step: 8500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:57:06,781-Speed 4901.03 samples/sec Loss 15.1873 Epoch: 1 Global Step: 8550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:57:17,237-Speed 4896.79 samples/sec Loss 15.1809 Epoch: 1 Global Step: 8600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:57:27,856-Speed 4822.04 samples/sec Loss 15.2169 Epoch: 1 Global Step: 8650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:57:38,519-Speed 4801.78 samples/sec Loss 15.1466 Epoch: 1 Global Step: 8700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:57:49,316-Speed 4742.25 samples/sec Loss 15.1285 Epoch: 1 Global Step: 8750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:58:00,027-Speed 4780.42 samples/sec Loss 15.0971 Epoch: 1 Global Step: 8800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:58:10,807-Speed 4749.99 samples/sec Loss 15.1159 Epoch: 1 Global Step: 8850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:58:21,395-Speed 4835.62 samples/sec Loss 15.1848 Epoch: 1 Global Step: 8900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:58:31,876-Speed 4885.48 samples/sec Loss 15.1349 Epoch: 1 Global Step: 8950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:58:42,462-Speed 4836.79 samples/sec Loss 15.0941 Epoch: 1 Global Step: 9000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:58:52,869-Speed 4919.76 samples/sec Loss 15.1109 Epoch: 1 Global Step: 9050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:59:03,489-Speed 4821.47 samples/sec Loss 15.1138 Epoch: 1 Global Step: 9100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:59:14,132-Speed 4810.86 samples/sec Loss 15.1227 Epoch: 1 Global Step: 9150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:59:24,604-Speed 4889.63 samples/sec Loss 15.1367 Epoch: 1 Global Step: 9200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:59:35,074-Speed 4889.97 samples/sec Loss 15.0233 Epoch: 1 Global Step: 9250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:59:45,630-Speed 4850.86 samples/sec Loss 14.9473 Epoch: 1 Global Step: 9300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 21:59:56,181-Speed 4852.78 samples/sec Loss 15.0346 Epoch: 1 Global Step: 9350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:00:06,790-Speed 4826.37 samples/sec Loss 15.0575 Epoch: 1 Global Step: 9400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:00:17,205-Speed 4916.21 samples/sec Loss 15.0121 Epoch: 1 Global Step: 9450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:00:27,574-Speed 4938.22 samples/sec Loss 14.9745 Epoch: 1 Global Step: 9500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:00:38,142-Speed 4844.90 samples/sec Loss 15.0939 Epoch: 1 Global Step: 9550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:00:48,819-Speed 4795.73 samples/sec Loss 14.9666 Epoch: 1 Global Step: 9600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:00:59,229-Speed 4918.17 samples/sec Loss 14.9264 Epoch: 1 Global Step: 9650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:01:09,620-Speed 4927.62 samples/sec Loss 15.0418 Epoch: 1 Global Step: 9700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:01:20,135-Speed 4869.76 samples/sec Loss 14.9002 Epoch: 1 Global Step: 9750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:01:30,563-Speed 4909.74 samples/sec Loss 14.9498 Epoch: 1 Global Step: 9800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:01:40,908-Speed 4949.68 samples/sec Loss 14.9932 Epoch: 1 Global Step: 9850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:01:51,389-Speed 4885.13 samples/sec Loss 14.9857 Epoch: 1 Global Step: 9900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:02:01,954-Speed 4846.53 samples/sec Loss 14.9769 Epoch: 1 Global Step: 9950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:02:15,468-Speed 3789.01 samples/sec Loss 14.3202 Epoch: 2 Global Step: 10000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:02:39,029-[lfw][10000]XNorm: 22.383479 Training: 2021-03-16 22:02:39,029-[lfw][10000]Accuracy-Flip: 0.99500+-0.00316 Training: 2021-03-16 22:02:39,029-[lfw][10000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:03:06,011-[cfp_fp][10000]XNorm: 18.664479 Training: 2021-03-16 22:03:06,011-[cfp_fp][10000]Accuracy-Flip: 0.93500+-0.00926 Training: 2021-03-16 22:03:06,012-[cfp_fp][10000]Accuracy-Highest: 0.93500 Training: 2021-03-16 22:03:29,256-[agedb_30][10000]XNorm: 21.668898 Training: 2021-03-16 22:03:29,257-[agedb_30][10000]Accuracy-Flip: 0.95000+-0.01155 Training: 2021-03-16 22:03:29,257-[agedb_30][10000]Accuracy-Highest: 0.95000 Training: 2021-03-16 22:03:39,564-Speed 608.83 samples/sec Loss 14.2323 Epoch: 2 Global Step: 10050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:03:50,351-Speed 4746.83 samples/sec Loss 14.3464 Epoch: 2 Global Step: 10100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:04:00,707-Speed 4944.02 samples/sec Loss 14.4105 Epoch: 2 Global Step: 10150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:04:11,290-Speed 4838.30 samples/sec Loss 14.5377 Epoch: 2 Global Step: 10200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:04:22,023-Speed 4770.71 samples/sec Loss 14.6718 Epoch: 2 Global Step: 10250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:04:32,632-Speed 4826.01 samples/sec Loss 14.6783 Epoch: 2 Global Step: 10300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:04:43,036-Speed 4921.59 samples/sec Loss 14.7982 Epoch: 2 Global Step: 10350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:04:53,203-Speed 5036.54 samples/sec Loss 14.7580 Epoch: 2 Global Step: 10400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:05:03,638-Speed 4906.83 samples/sec Loss 14.8424 Epoch: 2 Global Step: 10450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:05:14,215-Speed 4840.96 samples/sec Loss 14.7169 Epoch: 2 Global Step: 10500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:05:24,580-Speed 4939.57 samples/sec Loss 14.8428 Epoch: 2 Global Step: 10550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:05:34,882-Speed 4970.29 samples/sec Loss 14.8744 Epoch: 2 Global Step: 10600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:05:45,474-Speed 4834.20 samples/sec Loss 14.7815 Epoch: 2 Global Step: 10650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:05:55,988-Speed 4869.85 samples/sec Loss 14.8046 Epoch: 2 Global Step: 10700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:06:06,818-Speed 4727.86 samples/sec Loss 14.8232 Epoch: 2 Global Step: 10750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:06:17,869-Speed 4633.41 samples/sec Loss 14.7791 Epoch: 2 Global Step: 10800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:06:28,317-Speed 4900.72 samples/sec Loss 14.8566 Epoch: 2 Global Step: 10850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:06:39,072-Speed 4760.78 samples/sec Loss 14.8171 Epoch: 2 Global Step: 10900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:06:49,399-Speed 4958.10 samples/sec Loss 14.7937 Epoch: 2 Global Step: 10950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:06:59,980-Speed 4839.10 samples/sec Loss 14.7285 Epoch: 2 Global Step: 11000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:07:10,843-Speed 4713.40 samples/sec Loss 14.7435 Epoch: 2 Global Step: 11050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:07:21,525-Speed 4793.16 samples/sec Loss 14.7981 Epoch: 2 Global Step: 11100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:07:32,041-Speed 4868.84 samples/sec Loss 14.8398 Epoch: 2 Global Step: 11150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:07:42,157-Speed 5061.95 samples/sec Loss 14.7871 Epoch: 2 Global Step: 11200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:07:52,777-Speed 4821.32 samples/sec Loss 14.6912 Epoch: 2 Global Step: 11250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:08:03,353-Speed 4841.06 samples/sec Loss 14.6856 Epoch: 2 Global Step: 11300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:08:13,963-Speed 4826.02 samples/sec Loss 14.8033 Epoch: 2 Global Step: 11350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:08:24,441-Speed 4886.54 samples/sec Loss 14.7494 Epoch: 2 Global Step: 11400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:08:34,844-Speed 4922.09 samples/sec Loss 14.7444 Epoch: 2 Global Step: 11450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:08:45,491-Speed 4809.00 samples/sec Loss 14.9194 Epoch: 2 Global Step: 11500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:08:56,013-Speed 4865.93 samples/sec Loss 14.8246 Epoch: 2 Global Step: 11550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:09:06,464-Speed 4899.46 samples/sec Loss 14.8333 Epoch: 2 Global Step: 11600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:09:17,160-Speed 4787.05 samples/sec Loss 14.7897 Epoch: 2 Global Step: 11650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:09:27,616-Speed 4896.71 samples/sec Loss 14.6949 Epoch: 2 Global Step: 11700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:09:38,129-Speed 4870.64 samples/sec Loss 14.7691 Epoch: 2 Global Step: 11750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:09:48,558-Speed 4909.35 samples/sec Loss 14.6173 Epoch: 2 Global Step: 11800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:09:59,003-Speed 4902.21 samples/sec Loss 14.6729 Epoch: 2 Global Step: 11850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:10:09,499-Speed 4878.05 samples/sec Loss 14.6350 Epoch: 2 Global Step: 11900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:10:20,105-Speed 4828.03 samples/sec Loss 14.6954 Epoch: 2 Global Step: 11950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:10:30,624-Speed 4867.26 samples/sec Loss 14.6861 Epoch: 2 Global Step: 12000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:10:54,037-[lfw][12000]XNorm: 21.734310 Training: 2021-03-16 22:10:54,038-[lfw][12000]Accuracy-Flip: 0.99300+-0.00440 Training: 2021-03-16 22:10:54,038-[lfw][12000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:11:21,355-[cfp_fp][12000]XNorm: 18.251255 Training: 2021-03-16 22:11:21,355-[cfp_fp][12000]Accuracy-Flip: 0.93857+-0.01079 Training: 2021-03-16 22:11:21,355-[cfp_fp][12000]Accuracy-Highest: 0.93857 Training: 2021-03-16 22:11:44,618-[agedb_30][12000]XNorm: 20.905924 Training: 2021-03-16 22:11:44,618-[agedb_30][12000]Accuracy-Flip: 0.94700+-0.01005 Training: 2021-03-16 22:11:44,618-[agedb_30][12000]Accuracy-Highest: 0.95000 Training: 2021-03-16 22:11:55,501-Speed 603.23 samples/sec Loss 14.7961 Epoch: 2 Global Step: 12050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:12:05,876-Speed 4935.51 samples/sec Loss 14.7575 Epoch: 2 Global Step: 12100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:12:16,388-Speed 4870.64 samples/sec Loss 14.6439 Epoch: 2 Global Step: 12150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:12:26,780-Speed 4927.19 samples/sec Loss 14.6144 Epoch: 2 Global Step: 12200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:12:37,285-Speed 4874.11 samples/sec Loss 14.8447 Epoch: 2 Global Step: 12250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:12:47,796-Speed 4871.37 samples/sec Loss 14.6681 Epoch: 2 Global Step: 12300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:12:58,323-Speed 4863.65 samples/sec Loss 14.6654 Epoch: 2 Global Step: 12350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:13:09,004-Speed 4793.83 samples/sec Loss 14.6053 Epoch: 2 Global Step: 12400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:13:19,312-Speed 4967.28 samples/sec Loss 14.6669 Epoch: 2 Global Step: 12450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:13:29,668-Speed 4944.08 samples/sec Loss 14.6870 Epoch: 2 Global Step: 12500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:13:40,072-Speed 4921.80 samples/sec Loss 14.6169 Epoch: 2 Global Step: 12550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:13:50,550-Speed 4886.51 samples/sec Loss 14.5973 Epoch: 2 Global Step: 12600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:14:01,283-Speed 4770.70 samples/sec Loss 14.6717 Epoch: 2 Global Step: 12650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:14:11,712-Speed 4909.52 samples/sec Loss 14.6120 Epoch: 2 Global Step: 12700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:14:22,267-Speed 4851.22 samples/sec Loss 14.6722 Epoch: 2 Global Step: 12750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:14:32,759-Speed 4879.64 samples/sec Loss 14.5696 Epoch: 2 Global Step: 12800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:14:43,790-Speed 4642.06 samples/sec Loss 14.5651 Epoch: 2 Global Step: 12850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:14:54,107-Speed 4962.77 samples/sec Loss 14.5810 Epoch: 2 Global Step: 12900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:15:05,285-Speed 4580.53 samples/sec Loss 14.6128 Epoch: 2 Global Step: 12950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:15:15,755-Speed 4890.41 samples/sec Loss 14.5837 Epoch: 2 Global Step: 13000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:15:26,452-Speed 4786.85 samples/sec Loss 14.6254 Epoch: 2 Global Step: 13050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:15:36,975-Speed 4865.66 samples/sec Loss 14.6424 Epoch: 2 Global Step: 13100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:15:47,548-Speed 4842.75 samples/sec Loss 14.6128 Epoch: 2 Global Step: 13150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:15:57,816-Speed 4986.62 samples/sec Loss 14.6052 Epoch: 2 Global Step: 13200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:16:08,292-Speed 4887.59 samples/sec Loss 14.5754 Epoch: 2 Global Step: 13250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:16:18,960-Speed 4799.99 samples/sec Loss 14.5538 Epoch: 2 Global Step: 13300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:16:29,610-Speed 4807.56 samples/sec Loss 14.5408 Epoch: 2 Global Step: 13350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:16:40,224-Speed 4824.22 samples/sec Loss 14.5759 Epoch: 2 Global Step: 13400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:16:50,835-Speed 4825.26 samples/sec Loss 14.6168 Epoch: 2 Global Step: 13450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:17:01,304-Speed 4891.03 samples/sec Loss 14.5632 Epoch: 2 Global Step: 13500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:17:11,873-Speed 4844.45 samples/sec Loss 14.4659 Epoch: 2 Global Step: 13550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:17:22,321-Speed 4900.60 samples/sec Loss 14.5423 Epoch: 2 Global Step: 13600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:17:33,040-Speed 4776.78 samples/sec Loss 14.5351 Epoch: 2 Global Step: 13650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:17:43,511-Speed 4889.94 samples/sec Loss 14.6047 Epoch: 2 Global Step: 13700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:17:54,037-Speed 4864.35 samples/sec Loss 14.6168 Epoch: 2 Global Step: 13750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:18:04,500-Speed 4893.52 samples/sec Loss 14.4659 Epoch: 2 Global Step: 13800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:18:14,949-Speed 4900.43 samples/sec Loss 14.6191 Epoch: 2 Global Step: 13850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:18:25,339-Speed 4927.81 samples/sec Loss 14.5844 Epoch: 2 Global Step: 13900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:18:35,738-Speed 4924.16 samples/sec Loss 14.6096 Epoch: 2 Global Step: 13950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:18:46,706-Speed 4668.43 samples/sec Loss 14.4329 Epoch: 2 Global Step: 14000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:19:10,246-[lfw][14000]XNorm: 23.667068 Training: 2021-03-16 22:19:10,247-[lfw][14000]Accuracy-Flip: 0.99417+-0.00318 Training: 2021-03-16 22:19:10,247-[lfw][14000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:19:38,339-[cfp_fp][14000]XNorm: 19.727501 Training: 2021-03-16 22:19:38,339-[cfp_fp][14000]Accuracy-Flip: 0.93329+-0.01539 Training: 2021-03-16 22:19:38,339-[cfp_fp][14000]Accuracy-Highest: 0.93857 Training: 2021-03-16 22:20:01,634-[agedb_30][14000]XNorm: 22.887219 Training: 2021-03-16 22:20:01,634-[agedb_30][14000]Accuracy-Flip: 0.94783+-0.01223 Training: 2021-03-16 22:20:01,634-[agedb_30][14000]Accuracy-Highest: 0.95000 Training: 2021-03-16 22:20:12,036-Speed 600.03 samples/sec Loss 14.5598 Epoch: 2 Global Step: 14050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:20:22,407-Speed 4936.97 samples/sec Loss 14.5639 Epoch: 2 Global Step: 14100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:20:32,702-Speed 4973.36 samples/sec Loss 14.5310 Epoch: 2 Global Step: 14150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:20:43,149-Speed 4901.52 samples/sec Loss 14.6271 Epoch: 2 Global Step: 14200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:20:53,565-Speed 4915.56 samples/sec Loss 14.5444 Epoch: 2 Global Step: 14250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:21:04,194-Speed 4817.32 samples/sec Loss 14.4937 Epoch: 2 Global Step: 14300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:21:14,649-Speed 4897.22 samples/sec Loss 14.5017 Epoch: 2 Global Step: 14350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:21:25,190-Speed 4857.34 samples/sec Loss 14.4437 Epoch: 2 Global Step: 14400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:21:35,862-Speed 4798.06 samples/sec Loss 14.3886 Epoch: 2 Global Step: 14450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:21:46,313-Speed 4899.19 samples/sec Loss 14.4537 Epoch: 2 Global Step: 14500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:21:56,721-Speed 4919.70 samples/sec Loss 14.6177 Epoch: 2 Global Step: 14550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:22:07,079-Speed 4943.10 samples/sec Loss 14.5533 Epoch: 2 Global Step: 14600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:22:17,428-Speed 4947.47 samples/sec Loss 14.5686 Epoch: 2 Global Step: 14650 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:22:27,910-Speed 4884.98 samples/sec Loss 14.4177 Epoch: 2 Global Step: 14700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:22:38,376-Speed 4892.44 samples/sec Loss 14.4275 Epoch: 2 Global Step: 14750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:22:48,863-Speed 4882.19 samples/sec Loss 14.4391 Epoch: 2 Global Step: 14800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:22:59,695-Speed 4727.14 samples/sec Loss 14.4513 Epoch: 2 Global Step: 14850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2021-03-16 22:23:10,537-Speed 4722.56 samples/sec Loss 14.5057 Epoch: 2 Global Step: 14900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:23:24,825-Speed 3583.67 samples/sec Loss 14.4507 Epoch: 3 Global Step: 14950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:23:35,793-Speed 4668.23 samples/sec Loss 13.6838 Epoch: 3 Global Step: 15000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:23:46,654-Speed 4714.67 samples/sec Loss 13.7730 Epoch: 3 Global Step: 15050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:23:57,485-Speed 4727.49 samples/sec Loss 13.9105 Epoch: 3 Global Step: 15100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:24:08,010-Speed 4864.67 samples/sec Loss 14.0016 Epoch: 3 Global Step: 15150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:24:18,586-Speed 4841.81 samples/sec Loss 14.0789 Epoch: 3 Global Step: 15200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:24:29,228-Speed 4811.24 samples/sec Loss 14.1588 Epoch: 3 Global Step: 15250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:24:39,780-Speed 4852.62 samples/sec Loss 14.1697 Epoch: 3 Global Step: 15300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:24:50,241-Speed 4894.65 samples/sec Loss 14.1553 Epoch: 3 Global Step: 15350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:25:00,739-Speed 4877.27 samples/sec Loss 14.2234 Epoch: 3 Global Step: 15400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:25:11,386-Speed 4809.24 samples/sec Loss 14.2995 Epoch: 3 Global Step: 15450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:25:21,963-Speed 4841.06 samples/sec Loss 14.4018 Epoch: 3 Global Step: 15500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:25:32,901-Speed 4681.11 samples/sec Loss 14.3913 Epoch: 3 Global Step: 15550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:25:43,496-Speed 4832.47 samples/sec Loss 14.3685 Epoch: 3 Global Step: 15600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:25:54,295-Speed 4741.66 samples/sec Loss 14.3165 Epoch: 3 Global Step: 15650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:26:04,858-Speed 4847.37 samples/sec Loss 14.4275 Epoch: 3 Global Step: 15700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:26:15,364-Speed 4873.40 samples/sec Loss 14.3708 Epoch: 3 Global Step: 15750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:26:25,897-Speed 4861.01 samples/sec Loss 14.3356 Epoch: 3 Global Step: 15800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:26:36,560-Speed 4801.91 samples/sec Loss 14.3998 Epoch: 3 Global Step: 15850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:26:47,195-Speed 4814.59 samples/sec Loss 14.2811 Epoch: 3 Global Step: 15900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:26:58,190-Speed 4657.05 samples/sec Loss 14.2937 Epoch: 3 Global Step: 15950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:27:08,952-Speed 4757.59 samples/sec Loss 14.3019 Epoch: 3 Global Step: 16000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:27:32,277-[lfw][16000]XNorm: 20.781095 Training: 2021-03-16 22:27:32,277-[lfw][16000]Accuracy-Flip: 0.99250+-0.00455 Training: 2021-03-16 22:27:32,277-[lfw][16000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:27:59,257-[cfp_fp][16000]XNorm: 17.395532 Training: 2021-03-16 22:27:59,257-[cfp_fp][16000]Accuracy-Flip: 0.93271+-0.01543 Training: 2021-03-16 22:27:59,257-[cfp_fp][16000]Accuracy-Highest: 0.93857 Training: 2021-03-16 22:28:22,664-[agedb_30][16000]XNorm: 20.202717 Training: 2021-03-16 22:28:22,664-[agedb_30][16000]Accuracy-Flip: 0.95333+-0.00726 Training: 2021-03-16 22:28:22,665-[agedb_30][16000]Accuracy-Highest: 0.95333 Training: 2021-03-16 22:28:32,924-Speed 609.73 samples/sec Loss 14.4177 Epoch: 3 Global Step: 16050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:28:43,289-Speed 4940.27 samples/sec Loss 14.4025 Epoch: 3 Global Step: 16100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:28:53,731-Speed 4903.51 samples/sec Loss 14.3345 Epoch: 3 Global Step: 16150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:29:04,138-Speed 4919.86 samples/sec Loss 14.2928 Epoch: 3 Global Step: 16200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:29:14,576-Speed 4905.59 samples/sec Loss 14.3805 Epoch: 3 Global Step: 16250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:29:24,890-Speed 4964.26 samples/sec Loss 14.2573 Epoch: 3 Global Step: 16300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:29:35,486-Speed 4832.09 samples/sec Loss 14.4137 Epoch: 3 Global Step: 16350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:29:46,129-Speed 4810.85 samples/sec Loss 14.3653 Epoch: 3 Global Step: 16400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:29:56,593-Speed 4893.45 samples/sec Loss 14.3515 Epoch: 3 Global Step: 16450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:30:07,211-Speed 4822.25 samples/sec Loss 14.3889 Epoch: 3 Global Step: 16500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:30:17,784-Speed 4842.53 samples/sec Loss 14.3792 Epoch: 3 Global Step: 16550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:30:28,327-Speed 4856.62 samples/sec Loss 14.3634 Epoch: 3 Global Step: 16600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:30:38,914-Speed 4836.15 samples/sec Loss 14.3134 Epoch: 3 Global Step: 16650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:30:49,396-Speed 4884.87 samples/sec Loss 14.2993 Epoch: 3 Global Step: 16700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:30:59,844-Speed 4900.90 samples/sec Loss 14.4162 Epoch: 3 Global Step: 16750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:31:10,542-Speed 4785.94 samples/sec Loss 14.4150 Epoch: 3 Global Step: 16800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:31:20,935-Speed 4926.52 samples/sec Loss 14.2517 Epoch: 3 Global Step: 16850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:31:31,440-Speed 4874.51 samples/sec Loss 14.3343 Epoch: 3 Global Step: 16900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:31:42,003-Speed 4847.15 samples/sec Loss 14.3360 Epoch: 3 Global Step: 16950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:31:53,096-Speed 4615.75 samples/sec Loss 14.3190 Epoch: 3 Global Step: 17000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:32:03,801-Speed 4783.16 samples/sec Loss 14.3406 Epoch: 3 Global Step: 17050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:32:14,447-Speed 4809.19 samples/sec Loss 14.3082 Epoch: 3 Global Step: 17100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:32:25,222-Speed 4752.01 samples/sec Loss 14.2784 Epoch: 3 Global Step: 17150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:32:35,729-Speed 4873.35 samples/sec Loss 14.2692 Epoch: 3 Global Step: 17200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:32:46,114-Speed 4930.50 samples/sec Loss 14.2656 Epoch: 3 Global Step: 17250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:32:56,530-Speed 4915.88 samples/sec Loss 14.2684 Epoch: 3 Global Step: 17300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:33:06,922-Speed 4926.95 samples/sec Loss 14.2895 Epoch: 3 Global Step: 17350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:33:17,470-Speed 4854.22 samples/sec Loss 14.3789 Epoch: 3 Global Step: 17400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:33:27,832-Speed 4941.76 samples/sec Loss 14.2658 Epoch: 3 Global Step: 17450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:33:38,363-Speed 4861.69 samples/sec Loss 14.2634 Epoch: 3 Global Step: 17500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:33:49,002-Speed 4812.79 samples/sec Loss 14.2537 Epoch: 3 Global Step: 17550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:33:59,657-Speed 4805.49 samples/sec Loss 14.2781 Epoch: 3 Global Step: 17600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:34:10,252-Speed 4832.80 samples/sec Loss 14.3382 Epoch: 3 Global Step: 17650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:34:20,870-Speed 4822.13 samples/sec Loss 14.3820 Epoch: 3 Global Step: 17700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:34:31,396-Speed 4864.11 samples/sec Loss 14.3076 Epoch: 3 Global Step: 17750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:34:41,980-Speed 4838.08 samples/sec Loss 14.4408 Epoch: 3 Global Step: 17800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:34:52,673-Speed 4788.27 samples/sec Loss 14.2387 Epoch: 3 Global Step: 17850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:35:03,313-Speed 4812.13 samples/sec Loss 14.2352 Epoch: 3 Global Step: 17900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:35:14,038-Speed 4774.26 samples/sec Loss 14.3678 Epoch: 3 Global Step: 17950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:35:24,483-Speed 4901.75 samples/sec Loss 14.3234 Epoch: 3 Global Step: 18000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:35:47,670-[lfw][18000]XNorm: 22.644514 Training: 2021-03-16 22:35:47,671-[lfw][18000]Accuracy-Flip: 0.99400+-0.00351 Training: 2021-03-16 22:35:47,671-[lfw][18000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:36:14,632-[cfp_fp][18000]XNorm: 18.785732 Training: 2021-03-16 22:36:14,632-[cfp_fp][18000]Accuracy-Flip: 0.93086+-0.01443 Training: 2021-03-16 22:36:14,632-[cfp_fp][18000]Accuracy-Highest: 0.93857 Training: 2021-03-16 22:36:38,187-[agedb_30][18000]XNorm: 21.940623 Training: 2021-03-16 22:36:38,187-[agedb_30][18000]Accuracy-Flip: 0.95033+-0.00983 Training: 2021-03-16 22:36:38,187-[agedb_30][18000]Accuracy-Highest: 0.95333 Training: 2021-03-16 22:36:48,389-Speed 610.21 samples/sec Loss 14.2818 Epoch: 3 Global Step: 18050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:36:58,813-Speed 4911.79 samples/sec Loss 14.2734 Epoch: 3 Global Step: 18100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:37:09,283-Speed 4890.60 samples/sec Loss 14.2269 Epoch: 3 Global Step: 18150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:37:19,624-Speed 4951.57 samples/sec Loss 14.3869 Epoch: 3 Global Step: 18200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:37:29,983-Speed 4942.74 samples/sec Loss 14.2772 Epoch: 3 Global Step: 18250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:37:40,480-Speed 4877.57 samples/sec Loss 14.2337 Epoch: 3 Global Step: 18300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:37:51,006-Speed 4864.29 samples/sec Loss 14.3284 Epoch: 3 Global Step: 18350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:38:01,493-Speed 4882.81 samples/sec Loss 14.2584 Epoch: 3 Global Step: 18400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:38:12,055-Speed 4847.58 samples/sec Loss 14.3325 Epoch: 3 Global Step: 18450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:38:22,700-Speed 4810.24 samples/sec Loss 14.1673 Epoch: 3 Global Step: 18500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:38:33,192-Speed 4879.82 samples/sec Loss 14.2670 Epoch: 3 Global Step: 18550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:38:43,787-Speed 4832.93 samples/sec Loss 14.2904 Epoch: 3 Global Step: 18600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:38:54,313-Speed 4864.54 samples/sec Loss 14.2950 Epoch: 3 Global Step: 18650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:39:04,876-Speed 4846.99 samples/sec Loss 14.2278 Epoch: 3 Global Step: 18700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:39:15,331-Speed 4897.40 samples/sec Loss 14.1930 Epoch: 3 Global Step: 18750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:39:26,307-Speed 4665.23 samples/sec Loss 14.3256 Epoch: 3 Global Step: 18800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:39:36,786-Speed 4886.22 samples/sec Loss 14.2608 Epoch: 3 Global Step: 18850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:39:47,137-Speed 4946.31 samples/sec Loss 14.2875 Epoch: 3 Global Step: 18900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:39:57,583-Speed 4901.63 samples/sec Loss 14.2787 Epoch: 3 Global Step: 18950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:40:08,217-Speed 4814.91 samples/sec Loss 14.2763 Epoch: 3 Global Step: 19000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:40:18,919-Speed 4784.48 samples/sec Loss 14.2288 Epoch: 3 Global Step: 19050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:40:29,450-Speed 4861.87 samples/sec Loss 14.1560 Epoch: 3 Global Step: 19100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:40:40,403-Speed 4674.71 samples/sec Loss 14.1783 Epoch: 3 Global Step: 19150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:40:51,374-Speed 4667.11 samples/sec Loss 14.2322 Epoch: 3 Global Step: 19200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:41:01,803-Speed 4909.65 samples/sec Loss 14.1904 Epoch: 3 Global Step: 19250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:41:12,548-Speed 4765.12 samples/sec Loss 14.2553 Epoch: 3 Global Step: 19300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:41:22,945-Speed 4924.88 samples/sec Loss 14.3420 Epoch: 3 Global Step: 19350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:41:33,560-Speed 4823.63 samples/sec Loss 14.2023 Epoch: 3 Global Step: 19400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:41:44,163-Speed 4828.81 samples/sec Loss 14.2155 Epoch: 3 Global Step: 19450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:41:54,839-Speed 4796.03 samples/sec Loss 14.2143 Epoch: 3 Global Step: 19500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:42:05,300-Speed 4894.76 samples/sec Loss 14.2961 Epoch: 3 Global Step: 19550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:42:15,615-Speed 4963.93 samples/sec Loss 14.2263 Epoch: 3 Global Step: 19600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:42:26,148-Speed 4861.28 samples/sec Loss 14.2461 Epoch: 3 Global Step: 19650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:42:36,652-Speed 4874.44 samples/sec Loss 14.1958 Epoch: 3 Global Step: 19700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:42:46,971-Speed 4962.39 samples/sec Loss 14.1607 Epoch: 3 Global Step: 19750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:42:57,549-Speed 4840.44 samples/sec Loss 14.1811 Epoch: 3 Global Step: 19800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:43:07,929-Speed 4932.72 samples/sec Loss 14.1469 Epoch: 3 Global Step: 19850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:43:18,610-Speed 4793.54 samples/sec Loss 14.2246 Epoch: 3 Global Step: 19900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:43:32,278-Speed 3746.12 samples/sec Loss 13.8934 Epoch: 4 Global Step: 19950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:43:42,952-Speed 4797.20 samples/sec Loss 13.5432 Epoch: 4 Global Step: 20000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:44:06,529-[lfw][20000]XNorm: 24.546374 Training: 2021-03-16 22:44:06,529-[lfw][20000]Accuracy-Flip: 0.99400+-0.00374 Training: 2021-03-16 22:44:06,529-[lfw][20000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:44:33,579-[cfp_fp][20000]XNorm: 20.426681 Training: 2021-03-16 22:44:33,579-[cfp_fp][20000]Accuracy-Flip: 0.94071+-0.00992 Training: 2021-03-16 22:44:33,579-[cfp_fp][20000]Accuracy-Highest: 0.94071 Training: 2021-03-16 22:44:56,687-[agedb_30][20000]XNorm: 23.227693 Training: 2021-03-16 22:44:56,687-[agedb_30][20000]Accuracy-Flip: 0.95333+-0.01019 Training: 2021-03-16 22:44:56,687-[agedb_30][20000]Accuracy-Highest: 0.95333 Training: 2021-03-16 22:45:07,084-Speed 608.57 samples/sec Loss 13.5747 Epoch: 4 Global Step: 20050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:45:17,627-Speed 4856.60 samples/sec Loss 13.6720 Epoch: 4 Global Step: 20100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:45:28,059-Speed 4908.57 samples/sec Loss 13.8633 Epoch: 4 Global Step: 20150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:45:38,584-Speed 4864.75 samples/sec Loss 13.9285 Epoch: 4 Global Step: 20200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:45:49,136-Speed 4852.19 samples/sec Loss 13.8882 Epoch: 4 Global Step: 20250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:45:59,519-Speed 4931.35 samples/sec Loss 13.9316 Epoch: 4 Global Step: 20300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:46:10,150-Speed 4816.59 samples/sec Loss 13.9972 Epoch: 4 Global Step: 20350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:46:20,476-Speed 4958.46 samples/sec Loss 14.0352 Epoch: 4 Global Step: 20400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:46:31,029-Speed 4851.89 samples/sec Loss 14.0322 Epoch: 4 Global Step: 20450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:46:41,396-Speed 4939.37 samples/sec Loss 14.1083 Epoch: 4 Global Step: 20500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:46:52,037-Speed 4811.46 samples/sec Loss 14.1550 Epoch: 4 Global Step: 20550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:47:02,592-Speed 4851.20 samples/sec Loss 14.1680 Epoch: 4 Global Step: 20600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:47:13,215-Speed 4820.11 samples/sec Loss 14.2618 Epoch: 4 Global Step: 20650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:47:23,788-Speed 4842.66 samples/sec Loss 14.0743 Epoch: 4 Global Step: 20700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:47:34,413-Speed 4818.79 samples/sec Loss 14.0961 Epoch: 4 Global Step: 20750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:47:45,226-Speed 4735.41 samples/sec Loss 14.0835 Epoch: 4 Global Step: 20800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:47:55,538-Speed 4965.32 samples/sec Loss 14.1993 Epoch: 4 Global Step: 20850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:48:05,921-Speed 4931.57 samples/sec Loss 14.0503 Epoch: 4 Global Step: 20900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:48:16,304-Speed 4930.98 samples/sec Loss 14.0605 Epoch: 4 Global Step: 20950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:48:26,858-Speed 4851.47 samples/sec Loss 14.1416 Epoch: 4 Global Step: 21000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:48:37,485-Speed 4818.47 samples/sec Loss 14.0837 Epoch: 4 Global Step: 21050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:48:48,105-Speed 4821.17 samples/sec Loss 14.1339 Epoch: 4 Global Step: 21100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:48:58,610-Speed 4874.24 samples/sec Loss 14.1684 Epoch: 4 Global Step: 21150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:49:09,155-Speed 4855.55 samples/sec Loss 14.2348 Epoch: 4 Global Step: 21200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:49:19,414-Speed 4990.60 samples/sec Loss 14.0956 Epoch: 4 Global Step: 21250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:49:31,023-Speed 4410.71 samples/sec Loss 14.0351 Epoch: 4 Global Step: 21300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:49:41,570-Speed 4854.56 samples/sec Loss 14.1315 Epoch: 4 Global Step: 21350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:49:51,993-Speed 4912.74 samples/sec Loss 14.0904 Epoch: 4 Global Step: 21400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:50:02,391-Speed 4924.08 samples/sec Loss 14.1678 Epoch: 4 Global Step: 21450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:50:12,952-Speed 4848.45 samples/sec Loss 14.0415 Epoch: 4 Global Step: 21500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:50:23,535-Speed 4837.95 samples/sec Loss 14.1399 Epoch: 4 Global Step: 21550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:50:33,993-Speed 4896.42 samples/sec Loss 14.1682 Epoch: 4 Global Step: 21600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:50:44,633-Speed 4812.21 samples/sec Loss 14.1286 Epoch: 4 Global Step: 21650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:50:55,111-Speed 4887.03 samples/sec Loss 14.1687 Epoch: 4 Global Step: 21700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:51:05,378-Speed 4986.91 samples/sec Loss 14.1234 Epoch: 4 Global Step: 21750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:51:15,848-Speed 4890.43 samples/sec Loss 14.0511 Epoch: 4 Global Step: 21800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:51:26,269-Speed 4913.46 samples/sec Loss 14.0375 Epoch: 4 Global Step: 21850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:51:37,083-Speed 4734.73 samples/sec Loss 14.0651 Epoch: 4 Global Step: 21900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:51:47,409-Speed 4958.69 samples/sec Loss 14.1670 Epoch: 4 Global Step: 21950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:51:58,029-Speed 4821.53 samples/sec Loss 14.1888 Epoch: 4 Global Step: 22000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:52:21,164-[lfw][22000]XNorm: 22.839619 Training: 2021-03-16 22:52:21,165-[lfw][22000]Accuracy-Flip: 0.99450+-0.00342 Training: 2021-03-16 22:52:21,165-[lfw][22000]Accuracy-Highest: 0.99500 Training: 2021-03-16 22:52:47,874-[cfp_fp][22000]XNorm: 18.943188 Training: 2021-03-16 22:52:47,875-[cfp_fp][22000]Accuracy-Flip: 0.93300+-0.01216 Training: 2021-03-16 22:52:47,875-[cfp_fp][22000]Accuracy-Highest: 0.94071 Training: 2021-03-16 22:53:11,410-[agedb_30][22000]XNorm: 21.728453 Training: 2021-03-16 22:53:11,410-[agedb_30][22000]Accuracy-Flip: 0.95517+-0.01305 Training: 2021-03-16 22:53:11,410-[agedb_30][22000]Accuracy-Highest: 0.95517 Training: 2021-03-16 22:53:21,845-Speed 610.87 samples/sec Loss 14.1197 Epoch: 4 Global Step: 22050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:53:32,211-Speed 4939.12 samples/sec Loss 14.0802 Epoch: 4 Global Step: 22100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:53:42,417-Speed 5017.27 samples/sec Loss 14.0254 Epoch: 4 Global Step: 22150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:53:52,705-Speed 4976.85 samples/sec Loss 14.0446 Epoch: 4 Global Step: 22200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:54:03,067-Speed 4940.96 samples/sec Loss 14.0559 Epoch: 4 Global Step: 22250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:54:13,464-Speed 4924.88 samples/sec Loss 14.0944 Epoch: 4 Global Step: 22300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:54:23,870-Speed 4920.73 samples/sec Loss 14.0682 Epoch: 4 Global Step: 22350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:54:34,282-Speed 4917.62 samples/sec Loss 14.1055 Epoch: 4 Global Step: 22400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:54:44,929-Speed 4808.76 samples/sec Loss 14.1334 Epoch: 4 Global Step: 22450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:54:55,369-Speed 4904.64 samples/sec Loss 14.0797 Epoch: 4 Global Step: 22500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:55:06,060-Speed 4788.98 samples/sec Loss 14.1239 Epoch: 4 Global Step: 22550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:55:16,461-Speed 4923.07 samples/sec Loss 14.0890 Epoch: 4 Global Step: 22600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:55:26,898-Speed 4905.67 samples/sec Loss 14.1204 Epoch: 4 Global Step: 22650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:55:37,199-Speed 4970.68 samples/sec Loss 14.1762 Epoch: 4 Global Step: 22700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:55:47,671-Speed 4889.58 samples/sec Loss 14.1626 Epoch: 4 Global Step: 22750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:55:58,301-Speed 4816.60 samples/sec Loss 14.0539 Epoch: 4 Global Step: 22800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:56:08,685-Speed 4930.69 samples/sec Loss 14.0507 Epoch: 4 Global Step: 22850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:56:19,080-Speed 4925.84 samples/sec Loss 14.0011 Epoch: 4 Global Step: 22900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:56:29,686-Speed 4827.73 samples/sec Loss 14.0657 Epoch: 4 Global Step: 22950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:56:40,299-Speed 4824.50 samples/sec Loss 14.1377 Epoch: 4 Global Step: 23000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:56:50,916-Speed 4822.39 samples/sec Loss 14.1018 Epoch: 4 Global Step: 23050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:57:01,429-Speed 4870.53 samples/sec Loss 14.0361 Epoch: 4 Global Step: 23100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:57:11,921-Speed 4880.29 samples/sec Loss 14.1611 Epoch: 4 Global Step: 23150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:57:22,369-Speed 4901.22 samples/sec Loss 14.0956 Epoch: 4 Global Step: 23200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:57:32,929-Speed 4848.36 samples/sec Loss 13.9928 Epoch: 4 Global Step: 23250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:57:43,424-Speed 4879.01 samples/sec Loss 14.1861 Epoch: 4 Global Step: 23300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:57:54,159-Speed 4769.54 samples/sec Loss 14.0859 Epoch: 4 Global Step: 23350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:58:04,775-Speed 4823.19 samples/sec Loss 14.0695 Epoch: 4 Global Step: 23400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:58:15,559-Speed 4748.09 samples/sec Loss 14.1096 Epoch: 4 Global Step: 23450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:58:26,521-Speed 4671.13 samples/sec Loss 14.0256 Epoch: 4 Global Step: 23500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:58:37,004-Speed 4884.37 samples/sec Loss 14.0377 Epoch: 4 Global Step: 23550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:58:47,791-Speed 4746.62 samples/sec Loss 14.0076 Epoch: 4 Global Step: 23600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:58:58,174-Speed 4931.67 samples/sec Loss 14.1354 Epoch: 4 Global Step: 23650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:59:08,967-Speed 4743.89 samples/sec Loss 14.1400 Epoch: 4 Global Step: 23700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:59:19,280-Speed 4965.08 samples/sec Loss 13.9692 Epoch: 4 Global Step: 23750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:59:29,867-Speed 4836.11 samples/sec Loss 14.0376 Epoch: 4 Global Step: 23800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:59:40,434-Speed 4845.34 samples/sec Loss 14.0553 Epoch: 4 Global Step: 23850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 22:59:50,996-Speed 4847.96 samples/sec Loss 14.0657 Epoch: 4 Global Step: 23900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:00:01,806-Speed 4736.43 samples/sec Loss 14.1017 Epoch: 4 Global Step: 23950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:00:12,216-Speed 4918.68 samples/sec Loss 14.0509 Epoch: 4 Global Step: 24000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:00:35,272-[lfw][24000]XNorm: 21.497580 Training: 2021-03-16 23:00:35,272-[lfw][24000]Accuracy-Flip: 0.99467+-0.00414 Training: 2021-03-16 23:00:35,272-[lfw][24000]Accuracy-Highest: 0.99500 Training: 2021-03-16 23:01:02,171-[cfp_fp][24000]XNorm: 18.198844 Training: 2021-03-16 23:01:02,172-[cfp_fp][24000]Accuracy-Flip: 0.93429+-0.01505 Training: 2021-03-16 23:01:02,172-[cfp_fp][24000]Accuracy-Highest: 0.94071 Training: 2021-03-16 23:01:25,576-[agedb_30][24000]XNorm: 21.078681 Training: 2021-03-16 23:01:25,576-[agedb_30][24000]Accuracy-Flip: 0.95050+-0.01088 Training: 2021-03-16 23:01:25,576-[agedb_30][24000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:01:35,977-Speed 611.27 samples/sec Loss 14.0461 Epoch: 4 Global Step: 24050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:01:46,436-Speed 4895.50 samples/sec Loss 14.0749 Epoch: 4 Global Step: 24100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:01:56,868-Speed 4908.25 samples/sec Loss 14.0427 Epoch: 4 Global Step: 24150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:02:07,011-Speed 5048.07 samples/sec Loss 14.0702 Epoch: 4 Global Step: 24200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:02:17,741-Speed 4771.83 samples/sec Loss 14.0937 Epoch: 4 Global Step: 24250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:02:28,323-Speed 4838.92 samples/sec Loss 14.0729 Epoch: 4 Global Step: 24300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:02:38,769-Speed 4901.56 samples/sec Loss 14.0158 Epoch: 4 Global Step: 24350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:02:49,457-Speed 4790.45 samples/sec Loss 14.0757 Epoch: 4 Global Step: 24400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:02:59,855-Speed 4924.31 samples/sec Loss 13.9674 Epoch: 4 Global Step: 24450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:03:10,240-Speed 4930.81 samples/sec Loss 13.9785 Epoch: 4 Global Step: 24500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:03:20,470-Speed 5004.93 samples/sec Loss 14.0003 Epoch: 4 Global Step: 24550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:03:30,701-Speed 5005.03 samples/sec Loss 13.9888 Epoch: 4 Global Step: 24600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:03:40,995-Speed 4973.66 samples/sec Loss 14.0327 Epoch: 4 Global Step: 24650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:03:51,391-Speed 4925.54 samples/sec Loss 14.0422 Epoch: 4 Global Step: 24700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:04:02,206-Speed 4734.53 samples/sec Loss 14.1241 Epoch: 4 Global Step: 24750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:04:12,692-Speed 4882.84 samples/sec Loss 14.0212 Epoch: 4 Global Step: 24800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:04:23,351-Speed 4803.87 samples/sec Loss 14.0203 Epoch: 4 Global Step: 24850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:04:33,762-Speed 4917.73 samples/sec Loss 14.0760 Epoch: 4 Global Step: 24900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:04:47,283-Speed 3786.90 samples/sec Loss 13.3939 Epoch: 5 Global Step: 24950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:04:57,916-Speed 4815.58 samples/sec Loss 13.2891 Epoch: 5 Global Step: 25000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:05:08,705-Speed 4746.19 samples/sec Loss 13.4764 Epoch: 5 Global Step: 25050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:05:19,128-Speed 4912.27 samples/sec Loss 13.5059 Epoch: 5 Global Step: 25100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:05:29,667-Speed 4858.74 samples/sec Loss 13.6779 Epoch: 5 Global Step: 25150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:05:40,276-Speed 4826.19 samples/sec Loss 13.7899 Epoch: 5 Global Step: 25200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:05:50,776-Speed 4876.86 samples/sec Loss 13.7721 Epoch: 5 Global Step: 25250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:06:01,334-Speed 4849.53 samples/sec Loss 13.8740 Epoch: 5 Global Step: 25300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:06:11,753-Speed 4914.37 samples/sec Loss 13.8764 Epoch: 5 Global Step: 25350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:06:22,369-Speed 4823.38 samples/sec Loss 13.9472 Epoch: 5 Global Step: 25400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:06:32,722-Speed 4945.83 samples/sec Loss 13.9649 Epoch: 5 Global Step: 25450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:06:43,239-Speed 4868.59 samples/sec Loss 13.9408 Epoch: 5 Global Step: 25500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:06:54,204-Speed 4669.45 samples/sec Loss 13.9596 Epoch: 5 Global Step: 25550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:07:05,208-Speed 4653.27 samples/sec Loss 13.9541 Epoch: 5 Global Step: 25600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:07:15,705-Speed 4877.59 samples/sec Loss 13.9140 Epoch: 5 Global Step: 25650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:07:26,073-Speed 4938.55 samples/sec Loss 13.9791 Epoch: 5 Global Step: 25700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:07:36,608-Speed 4860.43 samples/sec Loss 13.9877 Epoch: 5 Global Step: 25750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:07:47,041-Speed 4907.88 samples/sec Loss 14.0107 Epoch: 5 Global Step: 25800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:07:58,052-Speed 4649.89 samples/sec Loss 13.9985 Epoch: 5 Global Step: 25850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:08:08,684-Speed 4816.09 samples/sec Loss 13.9564 Epoch: 5 Global Step: 25900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:08:19,455-Speed 4753.37 samples/sec Loss 14.0184 Epoch: 5 Global Step: 25950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:08:29,705-Speed 4995.44 samples/sec Loss 13.9776 Epoch: 5 Global Step: 26000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:08:52,844-[lfw][26000]XNorm: 20.616386 Training: 2021-03-16 23:08:52,844-[lfw][26000]Accuracy-Flip: 0.99417+-0.00367 Training: 2021-03-16 23:08:52,844-[lfw][26000]Accuracy-Highest: 0.99500 Training: 2021-03-16 23:09:19,650-[cfp_fp][26000]XNorm: 16.958663 Training: 2021-03-16 23:09:19,651-[cfp_fp][26000]Accuracy-Flip: 0.93586+-0.01313 Training: 2021-03-16 23:09:19,651-[cfp_fp][26000]Accuracy-Highest: 0.94071 Training: 2021-03-16 23:09:42,883-[agedb_30][26000]XNorm: 20.063526 Training: 2021-03-16 23:09:42,883-[agedb_30][26000]Accuracy-Flip: 0.95350+-0.01004 Training: 2021-03-16 23:09:42,883-[agedb_30][26000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:09:53,130-Speed 613.73 samples/sec Loss 13.9392 Epoch: 5 Global Step: 26050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:10:03,492-Speed 4941.36 samples/sec Loss 13.9926 Epoch: 5 Global Step: 26100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:10:13,903-Speed 4917.88 samples/sec Loss 13.9539 Epoch: 5 Global Step: 26150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:10:24,306-Speed 4922.21 samples/sec Loss 13.9779 Epoch: 5 Global Step: 26200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:10:34,813-Speed 4873.19 samples/sec Loss 13.9806 Epoch: 5 Global Step: 26250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:10:45,129-Speed 4963.24 samples/sec Loss 13.9714 Epoch: 5 Global Step: 26300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:10:55,547-Speed 4914.89 samples/sec Loss 14.0565 Epoch: 5 Global Step: 26350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:11:05,761-Speed 5013.11 samples/sec Loss 13.9996 Epoch: 5 Global Step: 26400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:11:16,199-Speed 4905.59 samples/sec Loss 13.9193 Epoch: 5 Global Step: 26450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:11:26,686-Speed 4882.22 samples/sec Loss 13.9300 Epoch: 5 Global Step: 26500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:11:37,118-Speed 4908.40 samples/sec Loss 13.9922 Epoch: 5 Global Step: 26550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:11:47,556-Speed 4905.57 samples/sec Loss 13.9576 Epoch: 5 Global Step: 26600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:11:58,387-Speed 4727.15 samples/sec Loss 14.0180 Epoch: 5 Global Step: 26650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:12:08,740-Speed 4945.93 samples/sec Loss 14.0885 Epoch: 5 Global Step: 26700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:12:19,334-Speed 4832.98 samples/sec Loss 13.9577 Epoch: 5 Global Step: 26750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:12:29,808-Speed 4888.66 samples/sec Loss 13.9998 Epoch: 5 Global Step: 26800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:12:40,320-Speed 4870.80 samples/sec Loss 13.9060 Epoch: 5 Global Step: 26850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:12:50,723-Speed 4921.81 samples/sec Loss 13.9792 Epoch: 5 Global Step: 26900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:13:01,091-Speed 4938.36 samples/sec Loss 13.9037 Epoch: 5 Global Step: 26950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:13:11,616-Speed 4864.86 samples/sec Loss 13.9920 Epoch: 5 Global Step: 27000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:13:22,039-Speed 4912.45 samples/sec Loss 13.9767 Epoch: 5 Global Step: 27050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:13:32,467-Speed 4910.14 samples/sec Loss 13.9537 Epoch: 5 Global Step: 27100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:13:42,768-Speed 4970.71 samples/sec Loss 14.0285 Epoch: 5 Global Step: 27150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:13:53,256-Speed 4881.96 samples/sec Loss 13.9996 Epoch: 5 Global Step: 27200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:14:03,684-Speed 4909.89 samples/sec Loss 13.9839 Epoch: 5 Global Step: 27250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:14:14,294-Speed 4825.69 samples/sec Loss 13.9880 Epoch: 5 Global Step: 27300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:14:24,870-Speed 4841.56 samples/sec Loss 13.9607 Epoch: 5 Global Step: 27350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:14:35,517-Speed 4808.90 samples/sec Loss 13.8549 Epoch: 5 Global Step: 27400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:14:46,229-Speed 4780.01 samples/sec Loss 13.9655 Epoch: 5 Global Step: 27450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:14:56,562-Speed 4955.38 samples/sec Loss 13.9464 Epoch: 5 Global Step: 27500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:15:07,026-Speed 4892.83 samples/sec Loss 13.9778 Epoch: 5 Global Step: 27550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:15:17,504-Speed 4886.88 samples/sec Loss 14.0060 Epoch: 5 Global Step: 27600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:15:27,816-Speed 4965.43 samples/sec Loss 13.8816 Epoch: 5 Global Step: 27650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:15:38,566-Speed 4762.66 samples/sec Loss 13.9850 Epoch: 5 Global Step: 27700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:15:49,478-Speed 4692.39 samples/sec Loss 14.0741 Epoch: 5 Global Step: 27750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:16:00,231-Speed 4761.70 samples/sec Loss 13.9627 Epoch: 5 Global Step: 27800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:16:10,844-Speed 4824.34 samples/sec Loss 13.9979 Epoch: 5 Global Step: 27850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:16:21,347-Speed 4875.28 samples/sec Loss 13.8585 Epoch: 5 Global Step: 27900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:16:32,046-Speed 4785.49 samples/sec Loss 13.9579 Epoch: 5 Global Step: 27950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:16:42,931-Speed 4704.31 samples/sec Loss 13.9054 Epoch: 5 Global Step: 28000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:17:06,183-[lfw][28000]XNorm: 20.816131 Training: 2021-03-16 23:17:06,184-[lfw][28000]Accuracy-Flip: 0.99383+-0.00342 Training: 2021-03-16 23:17:06,184-[lfw][28000]Accuracy-Highest: 0.99500 Training: 2021-03-16 23:17:33,096-[cfp_fp][28000]XNorm: 17.458893 Training: 2021-03-16 23:17:33,096-[cfp_fp][28000]Accuracy-Flip: 0.93900+-0.01325 Training: 2021-03-16 23:17:33,096-[cfp_fp][28000]Accuracy-Highest: 0.94071 Training: 2021-03-16 23:17:56,283-[agedb_30][28000]XNorm: 20.279525 Training: 2021-03-16 23:17:56,284-[agedb_30][28000]Accuracy-Flip: 0.95000+-0.00738 Training: 2021-03-16 23:17:56,284-[agedb_30][28000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:18:06,581-Speed 612.07 samples/sec Loss 13.9240 Epoch: 5 Global Step: 28050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:18:17,098-Speed 4868.66 samples/sec Loss 13.9794 Epoch: 5 Global Step: 28100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:18:27,458-Speed 4942.32 samples/sec Loss 13.9293 Epoch: 5 Global Step: 28150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:18:37,979-Speed 4866.72 samples/sec Loss 14.0013 Epoch: 5 Global Step: 28200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:18:48,452-Speed 4889.08 samples/sec Loss 13.9810 Epoch: 5 Global Step: 28250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:18:58,951-Speed 4877.30 samples/sec Loss 13.9259 Epoch: 5 Global Step: 28300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:19:09,342-Speed 4927.76 samples/sec Loss 13.8714 Epoch: 5 Global Step: 28350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:19:19,864-Speed 4866.28 samples/sec Loss 13.9559 Epoch: 5 Global Step: 28400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:19:30,595-Speed 4771.36 samples/sec Loss 13.9863 Epoch: 5 Global Step: 28450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:19:41,131-Speed 4859.64 samples/sec Loss 13.8667 Epoch: 5 Global Step: 28500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:19:51,417-Speed 4978.31 samples/sec Loss 14.0191 Epoch: 5 Global Step: 28550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:20:01,894-Speed 4886.82 samples/sec Loss 13.8402 Epoch: 5 Global Step: 28600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:20:12,562-Speed 4799.72 samples/sec Loss 13.8698 Epoch: 5 Global Step: 28650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:20:22,933-Speed 4937.02 samples/sec Loss 13.9633 Epoch: 5 Global Step: 28700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:20:33,491-Speed 4849.63 samples/sec Loss 13.9866 Epoch: 5 Global Step: 28750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:20:43,984-Speed 4879.56 samples/sec Loss 13.9402 Epoch: 5 Global Step: 28800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:20:54,519-Speed 4860.23 samples/sec Loss 13.9589 Epoch: 5 Global Step: 28850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:21:05,084-Speed 4846.36 samples/sec Loss 13.9418 Epoch: 5 Global Step: 28900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:21:15,490-Speed 4920.42 samples/sec Loss 13.8689 Epoch: 5 Global Step: 28950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:21:26,050-Speed 4848.64 samples/sec Loss 14.0439 Epoch: 5 Global Step: 29000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:21:36,658-Speed 4826.81 samples/sec Loss 13.8758 Epoch: 5 Global Step: 29050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:21:47,142-Speed 4883.84 samples/sec Loss 13.9206 Epoch: 5 Global Step: 29100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:21:57,787-Speed 4810.33 samples/sec Loss 13.8456 Epoch: 5 Global Step: 29150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:22:08,136-Speed 4947.75 samples/sec Loss 13.8958 Epoch: 5 Global Step: 29200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:22:18,770-Speed 4814.86 samples/sec Loss 13.9155 Epoch: 5 Global Step: 29250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:22:29,339-Speed 4844.58 samples/sec Loss 13.8801 Epoch: 5 Global Step: 29300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:22:40,236-Speed 4698.65 samples/sec Loss 13.9812 Epoch: 5 Global Step: 29350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:22:50,726-Speed 4881.15 samples/sec Loss 13.9266 Epoch: 5 Global Step: 29400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:23:01,256-Speed 4862.33 samples/sec Loss 13.9667 Epoch: 5 Global Step: 29450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:23:11,900-Speed 4810.57 samples/sec Loss 13.8917 Epoch: 5 Global Step: 29500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:23:22,292-Speed 4927.04 samples/sec Loss 13.8795 Epoch: 5 Global Step: 29550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:23:32,825-Speed 4861.09 samples/sec Loss 13.8143 Epoch: 5 Global Step: 29600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:23:43,140-Speed 4963.92 samples/sec Loss 13.8767 Epoch: 5 Global Step: 29650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:23:53,541-Speed 4922.68 samples/sec Loss 13.8503 Epoch: 5 Global Step: 29700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:24:03,967-Speed 4911.09 samples/sec Loss 13.9587 Epoch: 5 Global Step: 29750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:24:14,795-Speed 4728.86 samples/sec Loss 13.9564 Epoch: 5 Global Step: 29800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:24:25,694-Speed 4697.84 samples/sec Loss 13.8842 Epoch: 5 Global Step: 29850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:24:39,810-Speed 3627.25 samples/sec Loss 13.8351 Epoch: 6 Global Step: 29900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:24:50,847-Speed 4639.47 samples/sec Loss 13.1503 Epoch: 6 Global Step: 29950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:25:01,731-Speed 4704.52 samples/sec Loss 13.2814 Epoch: 6 Global Step: 30000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:25:25,257-[lfw][30000]XNorm: 21.640472 Training: 2021-03-16 23:25:25,257-[lfw][30000]Accuracy-Flip: 0.99517+-0.00217 Training: 2021-03-16 23:25:25,257-[lfw][30000]Accuracy-Highest: 0.99517 Training: 2021-03-16 23:25:52,021-[cfp_fp][30000]XNorm: 17.863831 Training: 2021-03-16 23:25:52,021-[cfp_fp][30000]Accuracy-Flip: 0.94143+-0.01369 Training: 2021-03-16 23:25:52,021-[cfp_fp][30000]Accuracy-Highest: 0.94143 Training: 2021-03-16 23:26:15,174-[agedb_30][30000]XNorm: 20.625243 Training: 2021-03-16 23:26:15,174-[agedb_30][30000]Accuracy-Flip: 0.95083+-0.00990 Training: 2021-03-16 23:26:15,175-[agedb_30][30000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:26:25,712-Speed 609.66 samples/sec Loss 13.3940 Epoch: 6 Global Step: 30050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:26:36,196-Speed 4884.29 samples/sec Loss 13.5413 Epoch: 6 Global Step: 30100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:26:47,034-Speed 4724.31 samples/sec Loss 13.5805 Epoch: 6 Global Step: 30150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:26:57,614-Speed 4839.43 samples/sec Loss 13.6177 Epoch: 6 Global Step: 30200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2021-03-16 23:27:08,098-Speed 4883.92 samples/sec Loss 13.7462 Epoch: 6 Global Step: 30250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:27:18,510-Speed 4917.77 samples/sec Loss 13.7271 Epoch: 6 Global Step: 30300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:27:29,147-Speed 4813.54 samples/sec Loss 13.7680 Epoch: 6 Global Step: 30350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:27:39,493-Speed 4948.94 samples/sec Loss 13.8611 Epoch: 6 Global Step: 30400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:27:50,042-Speed 4853.61 samples/sec Loss 13.8453 Epoch: 6 Global Step: 30450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:28:00,469-Speed 4910.76 samples/sec Loss 13.8510 Epoch: 6 Global Step: 30500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:28:11,001-Speed 4861.56 samples/sec Loss 13.8817 Epoch: 6 Global Step: 30550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:28:21,642-Speed 4811.72 samples/sec Loss 13.8765 Epoch: 6 Global Step: 30600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:28:31,999-Speed 4943.63 samples/sec Loss 13.8908 Epoch: 6 Global Step: 30650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:28:42,530-Speed 4862.10 samples/sec Loss 13.8815 Epoch: 6 Global Step: 30700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:28:53,015-Speed 4883.51 samples/sec Loss 13.9263 Epoch: 6 Global Step: 30750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:29:03,847-Speed 4726.77 samples/sec Loss 13.8831 Epoch: 6 Global Step: 30800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:29:14,193-Speed 4949.43 samples/sec Loss 13.8543 Epoch: 6 Global Step: 30850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:29:24,737-Speed 4856.09 samples/sec Loss 13.8725 Epoch: 6 Global Step: 30900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:29:35,235-Speed 4877.45 samples/sec Loss 13.8771 Epoch: 6 Global Step: 30950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:29:45,706-Speed 4889.90 samples/sec Loss 13.8827 Epoch: 6 Global Step: 31000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:29:56,283-Speed 4841.12 samples/sec Loss 13.8239 Epoch: 6 Global Step: 31050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:30:06,813-Speed 4862.22 samples/sec Loss 13.8697 Epoch: 6 Global Step: 31100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:30:17,447-Speed 4815.14 samples/sec Loss 13.8625 Epoch: 6 Global Step: 31150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:30:27,848-Speed 4922.82 samples/sec Loss 13.7610 Epoch: 6 Global Step: 31200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:30:38,483-Speed 4814.46 samples/sec Loss 13.9427 Epoch: 6 Global Step: 31250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:30:48,994-Speed 4871.36 samples/sec Loss 13.9043 Epoch: 6 Global Step: 31300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:30:59,320-Speed 4958.83 samples/sec Loss 13.9066 Epoch: 6 Global Step: 31350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:31:09,919-Speed 4830.62 samples/sec Loss 13.8534 Epoch: 6 Global Step: 31400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:31:20,356-Speed 4905.92 samples/sec Loss 13.8913 Epoch: 6 Global Step: 31450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:31:30,863-Speed 4873.28 samples/sec Loss 13.8563 Epoch: 6 Global Step: 31500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:31:41,288-Speed 4911.63 samples/sec Loss 13.9804 Epoch: 6 Global Step: 31550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:31:51,905-Speed 4822.72 samples/sec Loss 13.8625 Epoch: 6 Global Step: 31600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:32:02,528-Speed 4819.86 samples/sec Loss 13.8476 Epoch: 6 Global Step: 31650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:32:12,947-Speed 4914.70 samples/sec Loss 13.8159 Epoch: 6 Global Step: 31700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:32:23,427-Speed 4885.69 samples/sec Loss 13.7533 Epoch: 6 Global Step: 31750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:32:34,062-Speed 4814.42 samples/sec Loss 13.9295 Epoch: 6 Global Step: 31800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:32:44,539-Speed 4886.92 samples/sec Loss 13.8135 Epoch: 6 Global Step: 31850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:32:55,147-Speed 4826.88 samples/sec Loss 13.7737 Epoch: 6 Global Step: 31900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:33:05,637-Speed 4881.28 samples/sec Loss 13.8119 Epoch: 6 Global Step: 31950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:33:16,428-Speed 4744.64 samples/sec Loss 13.9154 Epoch: 6 Global Step: 32000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:33:39,699-[lfw][32000]XNorm: 21.702865 Training: 2021-03-16 23:33:39,699-[lfw][32000]Accuracy-Flip: 0.99433+-0.00291 Training: 2021-03-16 23:33:39,699-[lfw][32000]Accuracy-Highest: 0.99517 Training: 2021-03-16 23:34:06,704-[cfp_fp][32000]XNorm: 18.492826 Training: 2021-03-16 23:34:06,704-[cfp_fp][32000]Accuracy-Flip: 0.93771+-0.01098 Training: 2021-03-16 23:34:06,705-[cfp_fp][32000]Accuracy-Highest: 0.94143 Training: 2021-03-16 23:34:29,841-[agedb_30][32000]XNorm: 20.921462 Training: 2021-03-16 23:34:29,841-[agedb_30][32000]Accuracy-Flip: 0.95117+-0.00997 Training: 2021-03-16 23:34:29,842-[agedb_30][32000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:34:40,714-Speed 607.46 samples/sec Loss 14.0075 Epoch: 6 Global Step: 32050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:34:50,998-Speed 4978.67 samples/sec Loss 13.8801 Epoch: 6 Global Step: 32100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:35:01,330-Speed 4955.83 samples/sec Loss 13.9355 Epoch: 6 Global Step: 32150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:35:11,947-Speed 4822.63 samples/sec Loss 13.8448 Epoch: 6 Global Step: 32200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:35:22,747-Speed 4741.45 samples/sec Loss 13.8966 Epoch: 6 Global Step: 32250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:35:33,327-Speed 4839.11 samples/sec Loss 13.9029 Epoch: 6 Global Step: 32300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:35:43,758-Speed 4908.97 samples/sec Loss 13.9464 Epoch: 6 Global Step: 32350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:35:54,215-Speed 4896.33 samples/sec Loss 13.8219 Epoch: 6 Global Step: 32400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:36:04,736-Speed 4866.73 samples/sec Loss 13.8787 Epoch: 6 Global Step: 32450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:36:15,146-Speed 4918.50 samples/sec Loss 13.8854 Epoch: 6 Global Step: 32500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:36:25,555-Speed 4919.17 samples/sec Loss 13.9458 Epoch: 6 Global Step: 32550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:36:36,175-Speed 4821.42 samples/sec Loss 13.7476 Epoch: 6 Global Step: 32600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:36:46,823-Speed 4808.65 samples/sec Loss 13.7812 Epoch: 6 Global Step: 32650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:36:57,509-Speed 4791.22 samples/sec Loss 13.8446 Epoch: 6 Global Step: 32700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:37:08,061-Speed 4852.82 samples/sec Loss 13.8161 Epoch: 6 Global Step: 32750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:37:18,411-Speed 4946.77 samples/sec Loss 13.9458 Epoch: 6 Global Step: 32800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:37:28,785-Speed 4936.46 samples/sec Loss 13.8785 Epoch: 6 Global Step: 32850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:37:39,345-Speed 4848.75 samples/sec Loss 13.9544 Epoch: 6 Global Step: 32900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:37:49,993-Speed 4808.49 samples/sec Loss 13.8660 Epoch: 6 Global Step: 32950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:38:00,571-Speed 4840.61 samples/sec Loss 13.8051 Epoch: 6 Global Step: 33000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:38:10,866-Speed 4973.54 samples/sec Loss 13.8439 Epoch: 6 Global Step: 33050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:38:21,299-Speed 4907.67 samples/sec Loss 13.8332 Epoch: 6 Global Step: 33100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:38:31,771-Speed 4889.60 samples/sec Loss 13.8402 Epoch: 6 Global Step: 33150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:38:42,185-Speed 4916.68 samples/sec Loss 13.8450 Epoch: 6 Global Step: 33200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:38:52,713-Speed 4863.29 samples/sec Loss 13.9197 Epoch: 6 Global Step: 33250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:39:03,203-Speed 4881.34 samples/sec Loss 13.8440 Epoch: 6 Global Step: 33300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:39:13,749-Speed 4854.86 samples/sec Loss 13.9309 Epoch: 6 Global Step: 33350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:39:24,284-Speed 4860.21 samples/sec Loss 13.7592 Epoch: 6 Global Step: 33400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:39:34,869-Speed 4837.42 samples/sec Loss 13.8686 Epoch: 6 Global Step: 33450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:39:45,267-Speed 4924.23 samples/sec Loss 13.8447 Epoch: 6 Global Step: 33500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:39:55,845-Speed 4840.34 samples/sec Loss 13.8162 Epoch: 6 Global Step: 33550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:40:06,384-Speed 4858.31 samples/sec Loss 13.7910 Epoch: 6 Global Step: 33600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:40:16,866-Speed 4884.84 samples/sec Loss 13.8178 Epoch: 6 Global Step: 33650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:40:27,427-Speed 4848.24 samples/sec Loss 13.9365 Epoch: 6 Global Step: 33700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:40:37,866-Speed 4905.05 samples/sec Loss 13.7485 Epoch: 6 Global Step: 33750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:40:48,573-Speed 4782.13 samples/sec Loss 13.8377 Epoch: 6 Global Step: 33800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:40:59,056-Speed 4884.37 samples/sec Loss 13.8297 Epoch: 6 Global Step: 33850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:41:09,635-Speed 4839.98 samples/sec Loss 13.8323 Epoch: 6 Global Step: 33900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:41:20,054-Speed 4914.39 samples/sec Loss 13.8759 Epoch: 6 Global Step: 33950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:41:30,899-Speed 4721.55 samples/sec Loss 13.8643 Epoch: 6 Global Step: 34000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:41:54,184-[lfw][34000]XNorm: 21.492495 Training: 2021-03-16 23:41:54,184-[lfw][34000]Accuracy-Flip: 0.99367+-0.00400 Training: 2021-03-16 23:41:54,184-[lfw][34000]Accuracy-Highest: 0.99517 Training: 2021-03-16 23:42:21,000-[cfp_fp][34000]XNorm: 17.741572 Training: 2021-03-16 23:42:21,000-[cfp_fp][34000]Accuracy-Flip: 0.94000+-0.01423 Training: 2021-03-16 23:42:21,000-[cfp_fp][34000]Accuracy-Highest: 0.94143 Training: 2021-03-16 23:42:44,289-[agedb_30][34000]XNorm: 20.865430 Training: 2021-03-16 23:42:44,289-[agedb_30][34000]Accuracy-Flip: 0.95100+-0.00870 Training: 2021-03-16 23:42:44,289-[agedb_30][34000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:42:54,628-Speed 611.50 samples/sec Loss 13.8258 Epoch: 6 Global Step: 34050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:43:05,489-Speed 4714.28 samples/sec Loss 13.8331 Epoch: 6 Global Step: 34100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:43:16,361-Speed 4709.34 samples/sec Loss 13.9422 Epoch: 6 Global Step: 34150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:43:26,843-Speed 4885.02 samples/sec Loss 13.9220 Epoch: 6 Global Step: 34200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:43:37,453-Speed 4825.69 samples/sec Loss 13.8203 Epoch: 6 Global Step: 34250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:43:47,938-Speed 4883.29 samples/sec Loss 13.8662 Epoch: 6 Global Step: 34300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:43:58,475-Speed 4859.29 samples/sec Loss 13.8223 Epoch: 6 Global Step: 34350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:44:09,240-Speed 4756.55 samples/sec Loss 13.7994 Epoch: 6 Global Step: 34400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:44:19,683-Speed 4902.85 samples/sec Loss 13.7707 Epoch: 6 Global Step: 34450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:44:30,385-Speed 4784.47 samples/sec Loss 13.7891 Epoch: 6 Global Step: 34500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:44:40,801-Speed 4915.99 samples/sec Loss 13.8371 Epoch: 6 Global Step: 34550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:44:51,257-Speed 4896.76 samples/sec Loss 13.8540 Epoch: 6 Global Step: 34600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:45:01,947-Speed 4789.63 samples/sec Loss 13.7982 Epoch: 6 Global Step: 34650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:45:12,521-Speed 4842.66 samples/sec Loss 13.7980 Epoch: 6 Global Step: 34700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:45:22,865-Speed 4949.72 samples/sec Loss 13.8206 Epoch: 6 Global Step: 34750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:45:33,370-Speed 4874.03 samples/sec Loss 13.8544 Epoch: 6 Global Step: 34800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:45:43,783-Speed 4917.26 samples/sec Loss 13.8354 Epoch: 6 Global Step: 34850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:45:57,389-Speed 3763.14 samples/sec Loss 13.3979 Epoch: 7 Global Step: 34900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:46:07,874-Speed 4883.78 samples/sec Loss 13.1890 Epoch: 7 Global Step: 34950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:46:18,457-Speed 4838.69 samples/sec Loss 13.2396 Epoch: 7 Global Step: 35000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:46:28,999-Speed 4857.10 samples/sec Loss 13.3862 Epoch: 7 Global Step: 35050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:46:39,920-Speed 4688.42 samples/sec Loss 13.4162 Epoch: 7 Global Step: 35100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:46:50,510-Speed 4835.19 samples/sec Loss 13.5306 Epoch: 7 Global Step: 35150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:47:01,015-Speed 4873.68 samples/sec Loss 13.6108 Epoch: 7 Global Step: 35200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:47:11,509-Speed 4879.60 samples/sec Loss 13.6772 Epoch: 7 Global Step: 35250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:47:22,067-Speed 4849.63 samples/sec Loss 13.7316 Epoch: 7 Global Step: 35300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:47:32,438-Speed 4937.08 samples/sec Loss 13.6682 Epoch: 7 Global Step: 35350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:47:42,710-Speed 4984.65 samples/sec Loss 13.6659 Epoch: 7 Global Step: 35400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:47:53,239-Speed 4863.28 samples/sec Loss 13.7455 Epoch: 7 Global Step: 35450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:48:03,657-Speed 4914.68 samples/sec Loss 13.6885 Epoch: 7 Global Step: 35500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:48:14,280-Speed 4819.80 samples/sec Loss 13.7993 Epoch: 7 Global Step: 35550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:48:24,898-Speed 4822.23 samples/sec Loss 13.7446 Epoch: 7 Global Step: 35600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:48:35,391-Speed 4879.75 samples/sec Loss 13.6826 Epoch: 7 Global Step: 35650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:48:45,733-Speed 4951.17 samples/sec Loss 13.8178 Epoch: 7 Global Step: 35700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:48:56,321-Speed 4835.92 samples/sec Loss 13.8360 Epoch: 7 Global Step: 35750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:49:06,853-Speed 4861.62 samples/sec Loss 13.8262 Epoch: 7 Global Step: 35800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:49:17,205-Speed 4946.06 samples/sec Loss 13.8592 Epoch: 7 Global Step: 35850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:49:28,022-Speed 4733.72 samples/sec Loss 13.8507 Epoch: 7 Global Step: 35900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:49:38,517-Speed 4878.70 samples/sec Loss 13.8816 Epoch: 7 Global Step: 35950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:49:49,097-Speed 4839.60 samples/sec Loss 13.9393 Epoch: 7 Global Step: 36000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:50:12,184-[lfw][36000]XNorm: 20.455110 Training: 2021-03-16 23:50:12,184-[lfw][36000]Accuracy-Flip: 0.99450+-0.00380 Training: 2021-03-16 23:50:12,184-[lfw][36000]Accuracy-Highest: 0.99517 Training: 2021-03-16 23:50:38,827-[cfp_fp][36000]XNorm: 17.363822 Training: 2021-03-16 23:50:38,827-[cfp_fp][36000]Accuracy-Flip: 0.93714+-0.01524 Training: 2021-03-16 23:50:38,827-[cfp_fp][36000]Accuracy-Highest: 0.94143 Training: 2021-03-16 23:51:02,050-[agedb_30][36000]XNorm: 19.811337 Training: 2021-03-16 23:51:02,051-[agedb_30][36000]Accuracy-Flip: 0.94433+-0.01225 Training: 2021-03-16 23:51:02,051-[agedb_30][36000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:51:12,666-Speed 612.67 samples/sec Loss 13.8291 Epoch: 7 Global Step: 36050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:51:23,077-Speed 4918.21 samples/sec Loss 13.9688 Epoch: 7 Global Step: 36100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:51:33,387-Speed 4965.99 samples/sec Loss 13.7862 Epoch: 7 Global Step: 36150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:51:43,731-Speed 4950.08 samples/sec Loss 13.6787 Epoch: 7 Global Step: 36200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:51:54,132-Speed 4922.66 samples/sec Loss 13.7745 Epoch: 7 Global Step: 36250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:52:05,018-Speed 4703.42 samples/sec Loss 13.7984 Epoch: 7 Global Step: 36300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:52:15,762-Speed 4765.99 samples/sec Loss 13.8657 Epoch: 7 Global Step: 36350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:52:26,197-Speed 4906.50 samples/sec Loss 13.7842 Epoch: 7 Global Step: 36400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:52:36,712-Speed 4869.80 samples/sec Loss 13.8118 Epoch: 7 Global Step: 36450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:52:47,113-Speed 4922.82 samples/sec Loss 13.7998 Epoch: 7 Global Step: 36500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:52:57,496-Speed 4931.42 samples/sec Loss 13.7853 Epoch: 7 Global Step: 36550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:53:08,158-Speed 4802.17 samples/sec Loss 13.8439 Epoch: 7 Global Step: 36600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:53:19,478-Speed 4523.41 samples/sec Loss 13.7256 Epoch: 7 Global Step: 36650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:53:30,156-Speed 4795.10 samples/sec Loss 13.8528 Epoch: 7 Global Step: 36700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:53:40,688-Speed 4861.44 samples/sec Loss 13.7599 Epoch: 7 Global Step: 36750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:53:51,268-Speed 4839.54 samples/sec Loss 13.7919 Epoch: 7 Global Step: 36800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:54:01,776-Speed 4872.89 samples/sec Loss 13.7914 Epoch: 7 Global Step: 36850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:54:12,537-Speed 4757.80 samples/sec Loss 13.8234 Epoch: 7 Global Step: 36900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:54:23,015-Speed 4887.05 samples/sec Loss 13.6616 Epoch: 7 Global Step: 36950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:54:33,426-Speed 4917.97 samples/sec Loss 13.7386 Epoch: 7 Global Step: 37000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:54:44,039-Speed 4824.56 samples/sec Loss 13.7572 Epoch: 7 Global Step: 37050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:54:54,490-Speed 4899.02 samples/sec Loss 13.8302 Epoch: 7 Global Step: 37100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:55:04,903-Speed 4917.37 samples/sec Loss 13.8517 Epoch: 7 Global Step: 37150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:55:15,115-Speed 5013.90 samples/sec Loss 13.8441 Epoch: 7 Global Step: 37200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:55:25,470-Speed 4944.82 samples/sec Loss 13.7915 Epoch: 7 Global Step: 37250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:55:35,815-Speed 4949.10 samples/sec Loss 13.7780 Epoch: 7 Global Step: 37300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:55:46,379-Speed 4846.87 samples/sec Loss 13.7930 Epoch: 7 Global Step: 37350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:55:56,601-Speed 5009.52 samples/sec Loss 13.7465 Epoch: 7 Global Step: 37400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:56:06,893-Speed 4974.81 samples/sec Loss 13.7318 Epoch: 7 Global Step: 37450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:56:17,469-Speed 4841.34 samples/sec Loss 13.8104 Epoch: 7 Global Step: 37500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:56:27,934-Speed 4893.08 samples/sec Loss 13.8037 Epoch: 7 Global Step: 37550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:56:38,475-Speed 4857.56 samples/sec Loss 13.7668 Epoch: 7 Global Step: 37600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:56:49,056-Speed 4838.99 samples/sec Loss 13.7429 Epoch: 7 Global Step: 37650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:56:59,680-Speed 4819.78 samples/sec Loss 13.7628 Epoch: 7 Global Step: 37700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:57:10,421-Speed 4766.98 samples/sec Loss 13.7952 Epoch: 7 Global Step: 37750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:57:20,776-Speed 4944.71 samples/sec Loss 13.7241 Epoch: 7 Global Step: 37800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:57:31,212-Speed 4906.45 samples/sec Loss 13.8298 Epoch: 7 Global Step: 37850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:57:41,818-Speed 4827.45 samples/sec Loss 13.8090 Epoch: 7 Global Step: 37900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:57:52,476-Speed 4804.23 samples/sec Loss 13.7151 Epoch: 7 Global Step: 37950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:58:03,248-Speed 4753.14 samples/sec Loss 13.7804 Epoch: 7 Global Step: 38000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:58:26,462-[lfw][38000]XNorm: 22.548745 Training: 2021-03-16 23:58:26,463-[lfw][38000]Accuracy-Flip: 0.99583+-0.00335 Training: 2021-03-16 23:58:26,463-[lfw][38000]Accuracy-Highest: 0.99583 Training: 2021-03-16 23:58:53,213-[cfp_fp][38000]XNorm: 18.829639 Training: 2021-03-16 23:58:53,213-[cfp_fp][38000]Accuracy-Flip: 0.93600+-0.01458 Training: 2021-03-16 23:58:53,213-[cfp_fp][38000]Accuracy-Highest: 0.94143 Training: 2021-03-16 23:59:16,415-[agedb_30][38000]XNorm: 21.455922 Training: 2021-03-16 23:59:16,415-[agedb_30][38000]Accuracy-Flip: 0.94583+-0.01270 Training: 2021-03-16 23:59:16,415-[agedb_30][38000]Accuracy-Highest: 0.95517 Training: 2021-03-16 23:59:26,764-Speed 613.06 samples/sec Loss 13.9003 Epoch: 7 Global Step: 38050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:59:37,150-Speed 4929.85 samples/sec Loss 13.8260 Epoch: 7 Global Step: 38100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:59:47,480-Speed 4956.45 samples/sec Loss 13.7948 Epoch: 7 Global Step: 38150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-16 23:59:57,892-Speed 4917.56 samples/sec Loss 13.6968 Epoch: 7 Global Step: 38200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:00:08,446-Speed 4851.70 samples/sec Loss 13.8675 Epoch: 7 Global Step: 38250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:00:18,873-Speed 4910.33 samples/sec Loss 13.7569 Epoch: 7 Global Step: 38300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:00:29,302-Speed 4909.75 samples/sec Loss 13.7852 Epoch: 7 Global Step: 38350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:00:39,976-Speed 4796.94 samples/sec Loss 13.7568 Epoch: 7 Global Step: 38400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:00:50,521-Speed 4855.61 samples/sec Loss 13.7542 Epoch: 7 Global Step: 38450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:01:00,960-Speed 4905.28 samples/sec Loss 13.7066 Epoch: 7 Global Step: 38500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:01:11,699-Speed 4767.53 samples/sec Loss 13.7910 Epoch: 7 Global Step: 38550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:01:22,134-Speed 4907.01 samples/sec Loss 13.7530 Epoch: 7 Global Step: 38600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:01:32,838-Speed 4783.39 samples/sec Loss 13.7887 Epoch: 7 Global Step: 38650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:01:43,189-Speed 4946.53 samples/sec Loss 13.7535 Epoch: 7 Global Step: 38700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:01:54,171-Speed 4662.46 samples/sec Loss 13.8296 Epoch: 7 Global Step: 38750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:02:04,562-Speed 4927.36 samples/sec Loss 13.7401 Epoch: 7 Global Step: 38800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:02:15,207-Speed 4810.25 samples/sec Loss 13.8290 Epoch: 7 Global Step: 38850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:02:25,511-Speed 4969.30 samples/sec Loss 13.7566 Epoch: 7 Global Step: 38900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:02:36,212-Speed 4784.78 samples/sec Loss 13.7258 Epoch: 7 Global Step: 38950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:02:46,843-Speed 4816.35 samples/sec Loss 13.6742 Epoch: 7 Global Step: 39000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:02:57,591-Speed 4764.03 samples/sec Loss 13.7516 Epoch: 7 Global Step: 39050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:03:08,037-Speed 4901.80 samples/sec Loss 13.8228 Epoch: 7 Global Step: 39100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:03:18,519-Speed 4884.51 samples/sec Loss 13.6779 Epoch: 7 Global Step: 39150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:03:28,697-Speed 5031.15 samples/sec Loss 13.8249 Epoch: 7 Global Step: 39200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:03:39,135-Speed 4905.12 samples/sec Loss 13.7730 Epoch: 7 Global Step: 39250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:03:49,518-Speed 4931.69 samples/sec Loss 13.7789 Epoch: 7 Global Step: 39300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:04:00,006-Speed 4882.01 samples/sec Loss 13.6460 Epoch: 7 Global Step: 39350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:04:10,454-Speed 4900.86 samples/sec Loss 13.7619 Epoch: 7 Global Step: 39400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:04:20,999-Speed 4855.36 samples/sec Loss 13.8266 Epoch: 7 Global Step: 39450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:04:31,508-Speed 4872.45 samples/sec Loss 13.7405 Epoch: 7 Global Step: 39500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:04:42,262-Speed 4761.15 samples/sec Loss 13.7599 Epoch: 7 Global Step: 39550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:04:52,707-Speed 4901.92 samples/sec Loss 13.7714 Epoch: 7 Global Step: 39600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:05:03,253-Speed 4855.06 samples/sec Loss 13.8021 Epoch: 7 Global Step: 39650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:05:13,817-Speed 4846.86 samples/sec Loss 13.7826 Epoch: 7 Global Step: 39700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:05:24,249-Speed 4908.25 samples/sec Loss 13.7227 Epoch: 7 Global Step: 39750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:05:34,914-Speed 4801.05 samples/sec Loss 13.7216 Epoch: 7 Global Step: 39800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:05:45,489-Speed 4841.91 samples/sec Loss 13.8095 Epoch: 7 Global Step: 39850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:05:58,735-Speed 3865.39 samples/sec Loss 13.1333 Epoch: 8 Global Step: 39900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:06:09,337-Speed 4829.79 samples/sec Loss 13.0855 Epoch: 8 Global Step: 39950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:06:19,567-Speed 5005.15 samples/sec Loss 13.2770 Epoch: 8 Global Step: 40000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:06:43,163-[lfw][40000]XNorm: 22.293801 Training: 2021-03-17 00:06:43,163-[lfw][40000]Accuracy-Flip: 0.99433+-0.00382 Training: 2021-03-17 00:06:43,163-[lfw][40000]Accuracy-Highest: 0.99583 Training: 2021-03-17 00:07:09,901-[cfp_fp][40000]XNorm: 18.627544 Training: 2021-03-17 00:07:09,902-[cfp_fp][40000]Accuracy-Flip: 0.93257+-0.01253 Training: 2021-03-17 00:07:09,902-[cfp_fp][40000]Accuracy-Highest: 0.94143 Training: 2021-03-17 00:07:32,983-[agedb_30][40000]XNorm: 21.799453 Training: 2021-03-17 00:07:32,984-[agedb_30][40000]Accuracy-Flip: 0.94850+-0.00920 Training: 2021-03-17 00:07:32,984-[agedb_30][40000]Accuracy-Highest: 0.95517 Training: 2021-03-17 00:07:43,404-Speed 610.71 samples/sec Loss 13.3271 Epoch: 8 Global Step: 40050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:07:54,037-Speed 4815.56 samples/sec Loss 13.4446 Epoch: 8 Global Step: 40100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:08:04,535-Speed 4877.46 samples/sec Loss 13.4583 Epoch: 8 Global Step: 40150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:08:15,087-Speed 4852.34 samples/sec Loss 13.5796 Epoch: 8 Global Step: 40200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:08:25,526-Speed 4904.92 samples/sec Loss 13.6720 Epoch: 8 Global Step: 40250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:08:35,949-Speed 4912.53 samples/sec Loss 13.5623 Epoch: 8 Global Step: 40300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:08:46,447-Speed 4877.22 samples/sec Loss 13.5414 Epoch: 8 Global Step: 40350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:08:56,621-Speed 5032.77 samples/sec Loss 13.6941 Epoch: 8 Global Step: 40400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:09:07,099-Speed 4887.10 samples/sec Loss 13.6579 Epoch: 8 Global Step: 40450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:09:17,516-Speed 4915.05 samples/sec Loss 13.7436 Epoch: 8 Global Step: 40500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:09:27,928-Speed 4917.69 samples/sec Loss 13.7163 Epoch: 8 Global Step: 40550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:09:38,526-Speed 4831.16 samples/sec Loss 13.7225 Epoch: 8 Global Step: 40600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:09:49,546-Speed 4646.65 samples/sec Loss 13.7467 Epoch: 8 Global Step: 40650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:10:00,417-Speed 4709.94 samples/sec Loss 13.7325 Epoch: 8 Global Step: 40700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:10:10,833-Speed 4915.53 samples/sec Loss 13.7392 Epoch: 8 Global Step: 40750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:10:21,825-Speed 4658.04 samples/sec Loss 13.8253 Epoch: 8 Global Step: 40800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:10:32,691-Speed 4712.44 samples/sec Loss 13.7829 Epoch: 8 Global Step: 40850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:10:43,008-Speed 4962.82 samples/sec Loss 13.7534 Epoch: 8 Global Step: 40900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:10:53,721-Speed 4779.31 samples/sec Loss 13.7244 Epoch: 8 Global Step: 40950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:11:04,614-Speed 4700.77 samples/sec Loss 13.6704 Epoch: 8 Global Step: 41000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:11:15,048-Speed 4907.25 samples/sec Loss 13.7350 Epoch: 8 Global Step: 41050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:11:25,456-Speed 4919.73 samples/sec Loss 13.7811 Epoch: 8 Global Step: 41100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:11:35,894-Speed 4905.29 samples/sec Loss 13.7354 Epoch: 8 Global Step: 41150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:11:46,477-Speed 4838.36 samples/sec Loss 13.7521 Epoch: 8 Global Step: 41200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:11:57,090-Speed 4824.57 samples/sec Loss 13.7891 Epoch: 8 Global Step: 41250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:12:07,566-Speed 4887.52 samples/sec Loss 13.8180 Epoch: 8 Global Step: 41300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:12:18,172-Speed 4827.80 samples/sec Loss 13.8068 Epoch: 8 Global Step: 41350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:12:28,609-Speed 4905.44 samples/sec Loss 13.7652 Epoch: 8 Global Step: 41400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:12:39,055-Speed 4901.80 samples/sec Loss 13.7947 Epoch: 8 Global Step: 41450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:12:49,854-Speed 4741.45 samples/sec Loss 13.7144 Epoch: 8 Global Step: 41500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:13:00,564-Speed 4780.84 samples/sec Loss 13.8308 Epoch: 8 Global Step: 41550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:13:11,099-Speed 4860.01 samples/sec Loss 13.7260 Epoch: 8 Global Step: 41600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:13:21,621-Speed 4866.15 samples/sec Loss 13.7062 Epoch: 8 Global Step: 41650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:13:32,196-Speed 4841.89 samples/sec Loss 13.7135 Epoch: 8 Global Step: 41700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:13:42,536-Speed 4951.61 samples/sec Loss 13.7533 Epoch: 8 Global Step: 41750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:13:53,349-Speed 4735.50 samples/sec Loss 13.7089 Epoch: 8 Global Step: 41800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:14:03,958-Speed 4826.29 samples/sec Loss 13.7327 Epoch: 8 Global Step: 41850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:14:14,446-Speed 4882.09 samples/sec Loss 13.7298 Epoch: 8 Global Step: 41900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:14:24,749-Speed 4969.55 samples/sec Loss 13.7028 Epoch: 8 Global Step: 41950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:14:35,149-Speed 4923.39 samples/sec Loss 13.7022 Epoch: 8 Global Step: 42000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:14:58,324-[lfw][42000]XNorm: 20.614288 Training: 2021-03-17 00:14:58,324-[lfw][42000]Accuracy-Flip: 0.99517+-0.00302 Training: 2021-03-17 00:14:58,325-[lfw][42000]Accuracy-Highest: 0.99583 Training: 2021-03-17 00:15:25,059-[cfp_fp][42000]XNorm: 17.182367 Training: 2021-03-17 00:15:25,060-[cfp_fp][42000]Accuracy-Flip: 0.93314+-0.01407 Training: 2021-03-17 00:15:25,060-[cfp_fp][42000]Accuracy-Highest: 0.94143 Training: 2021-03-17 00:15:48,173-[agedb_30][42000]XNorm: 20.227297 Training: 2021-03-17 00:15:48,174-[agedb_30][42000]Accuracy-Flip: 0.95167+-0.00695 Training: 2021-03-17 00:15:48,174-[agedb_30][42000]Accuracy-Highest: 0.95517 Training: 2021-03-17 00:15:58,538-Speed 613.99 samples/sec Loss 13.7165 Epoch: 8 Global Step: 42050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:16:09,161-Speed 4820.10 samples/sec Loss 13.7550 Epoch: 8 Global Step: 42100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:16:19,732-Speed 4843.64 samples/sec Loss 13.7756 Epoch: 8 Global Step: 42150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:16:30,160-Speed 4910.01 samples/sec Loss 13.7880 Epoch: 8 Global Step: 42200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:16:40,812-Speed 4806.86 samples/sec Loss 13.7025 Epoch: 8 Global Step: 42250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:16:51,123-Speed 4965.69 samples/sec Loss 13.7764 Epoch: 8 Global Step: 42300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:17:01,597-Speed 4888.44 samples/sec Loss 13.7693 Epoch: 8 Global Step: 42350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:17:11,998-Speed 4922.94 samples/sec Loss 13.8127 Epoch: 8 Global Step: 42400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:17:22,550-Speed 4852.55 samples/sec Loss 13.7218 Epoch: 8 Global Step: 42450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:17:33,156-Speed 4827.64 samples/sec Loss 13.7143 Epoch: 8 Global Step: 42500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:17:43,549-Speed 4926.58 samples/sec Loss 13.7393 Epoch: 8 Global Step: 42550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:17:54,364-Speed 4734.61 samples/sec Loss 13.7107 Epoch: 8 Global Step: 42600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:18:04,885-Speed 4866.34 samples/sec Loss 13.6384 Epoch: 8 Global Step: 42650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:18:15,469-Speed 4837.63 samples/sec Loss 13.7450 Epoch: 8 Global Step: 42700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:18:26,037-Speed 4845.41 samples/sec Loss 13.8192 Epoch: 8 Global Step: 42750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:18:36,745-Speed 4781.66 samples/sec Loss 13.6323 Epoch: 8 Global Step: 42800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:18:47,205-Speed 4894.70 samples/sec Loss 13.7484 Epoch: 8 Global Step: 42850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:18:58,032-Speed 4729.21 samples/sec Loss 13.8322 Epoch: 8 Global Step: 42900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:19:08,838-Speed 4738.53 samples/sec Loss 13.6940 Epoch: 8 Global Step: 42950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:19:19,585-Speed 4764.30 samples/sec Loss 13.6215 Epoch: 8 Global Step: 43000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:19:30,267-Speed 4793.06 samples/sec Loss 13.7243 Epoch: 8 Global Step: 43050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:19:40,950-Speed 4792.98 samples/sec Loss 13.6490 Epoch: 8 Global Step: 43100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:19:51,526-Speed 4841.19 samples/sec Loss 13.6992 Epoch: 8 Global Step: 43150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:20:02,369-Speed 4722.54 samples/sec Loss 13.6998 Epoch: 8 Global Step: 43200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:20:12,977-Speed 4826.79 samples/sec Loss 13.7108 Epoch: 8 Global Step: 43250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:20:23,447-Speed 4890.14 samples/sec Loss 13.6896 Epoch: 8 Global Step: 43300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:20:34,062-Speed 4823.61 samples/sec Loss 13.6553 Epoch: 8 Global Step: 43350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:20:44,689-Speed 4818.04 samples/sec Loss 13.6690 Epoch: 8 Global Step: 43400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:20:55,094-Speed 4921.16 samples/sec Loss 13.7474 Epoch: 8 Global Step: 43450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:21:05,701-Speed 4827.17 samples/sec Loss 13.6435 Epoch: 8 Global Step: 43500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:21:16,056-Speed 4944.45 samples/sec Loss 13.7271 Epoch: 8 Global Step: 43550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:21:26,684-Speed 4818.08 samples/sec Loss 13.6888 Epoch: 8 Global Step: 43600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:21:37,279-Speed 4832.80 samples/sec Loss 13.7420 Epoch: 8 Global Step: 43650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:21:47,855-Speed 4841.23 samples/sec Loss 13.7189 Epoch: 8 Global Step: 43700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:21:58,269-Speed 4916.51 samples/sec Loss 13.7262 Epoch: 8 Global Step: 43750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:22:08,753-Speed 4884.28 samples/sec Loss 13.7592 Epoch: 8 Global Step: 43800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:22:19,426-Speed 4796.97 samples/sec Loss 13.6527 Epoch: 8 Global Step: 43850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:22:29,903-Speed 4887.15 samples/sec Loss 13.7033 Epoch: 8 Global Step: 43900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:22:40,208-Speed 4969.03 samples/sec Loss 13.7272 Epoch: 8 Global Step: 43950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:22:50,644-Speed 4906.14 samples/sec Loss 13.7225 Epoch: 8 Global Step: 44000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:23:13,728-[lfw][44000]XNorm: 21.893403 Training: 2021-03-17 00:23:13,728-[lfw][44000]Accuracy-Flip: 0.99417+-0.00300 Training: 2021-03-17 00:23:13,728-[lfw][44000]Accuracy-Highest: 0.99583 Training: 2021-03-17 00:23:40,570-[cfp_fp][44000]XNorm: 18.256022 Training: 2021-03-17 00:23:40,571-[cfp_fp][44000]Accuracy-Flip: 0.94471+-0.01614 Training: 2021-03-17 00:23:40,571-[cfp_fp][44000]Accuracy-Highest: 0.94471 Training: 2021-03-17 00:24:03,743-[agedb_30][44000]XNorm: 21.304959 Training: 2021-03-17 00:24:03,743-[agedb_30][44000]Accuracy-Flip: 0.95450+-0.00495 Training: 2021-03-17 00:24:03,744-[agedb_30][44000]Accuracy-Highest: 0.95517 Training: 2021-03-17 00:24:14,325-Speed 611.85 samples/sec Loss 13.7135 Epoch: 8 Global Step: 44050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:24:24,696-Speed 4937.42 samples/sec Loss 13.7146 Epoch: 8 Global Step: 44100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:24:35,456-Speed 4758.57 samples/sec Loss 13.7378 Epoch: 8 Global Step: 44150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:24:46,095-Speed 4812.62 samples/sec Loss 13.7076 Epoch: 8 Global Step: 44200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:24:56,475-Speed 4933.07 samples/sec Loss 13.6976 Epoch: 8 Global Step: 44250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:25:07,254-Speed 4749.87 samples/sec Loss 13.6287 Epoch: 8 Global Step: 44300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:25:17,904-Speed 4807.86 samples/sec Loss 13.6029 Epoch: 8 Global Step: 44350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:25:28,403-Speed 4877.15 samples/sec Loss 13.7316 Epoch: 8 Global Step: 44400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:25:38,802-Speed 4923.64 samples/sec Loss 13.7232 Epoch: 8 Global Step: 44450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:25:49,298-Speed 4878.47 samples/sec Loss 13.6831 Epoch: 8 Global Step: 44500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:25:59,867-Speed 4844.47 samples/sec Loss 13.7662 Epoch: 8 Global Step: 44550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:26:10,286-Speed 4914.10 samples/sec Loss 13.6904 Epoch: 8 Global Step: 44600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2021-03-17 00:26:20,640-Speed 4945.37 samples/sec Loss 13.7504 Epoch: 8 Global Step: 44650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:26:30,965-Speed 4959.22 samples/sec Loss 13.7460 Epoch: 8 Global Step: 44700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:26:41,732-Speed 4755.27 samples/sec Loss 13.7285 Epoch: 8 Global Step: 44750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:26:52,326-Speed 4833.17 samples/sec Loss 13.6657 Epoch: 8 Global Step: 44800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:27:05,581-Speed 3862.81 samples/sec Loss 13.5734 Epoch: 9 Global Step: 44850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:27:16,090-Speed 4872.53 samples/sec Loss 12.9447 Epoch: 9 Global Step: 44900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:27:26,694-Speed 4828.67 samples/sec Loss 13.0716 Epoch: 9 Global Step: 44950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:27:37,201-Speed 4873.29 samples/sec Loss 13.1915 Epoch: 9 Global Step: 45000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:27:47,845-Speed 4810.51 samples/sec Loss 13.3254 Epoch: 9 Global Step: 45050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:27:58,315-Speed 4890.38 samples/sec Loss 13.3477 Epoch: 9 Global Step: 45100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:28:09,188-Speed 4709.22 samples/sec Loss 13.4437 Epoch: 9 Global Step: 45150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:28:19,774-Speed 4836.96 samples/sec Loss 13.6122 Epoch: 9 Global Step: 45200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:28:30,244-Speed 4890.60 samples/sec Loss 13.5297 Epoch: 9 Global Step: 45250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:28:40,712-Speed 4891.64 samples/sec Loss 13.5463 Epoch: 9 Global Step: 45300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:28:51,384-Speed 4797.84 samples/sec Loss 13.5864 Epoch: 9 Global Step: 45350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:29:02,175-Speed 4745.06 samples/sec Loss 13.7206 Epoch: 9 Global Step: 45400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:29:12,689-Speed 4869.72 samples/sec Loss 13.6625 Epoch: 9 Global Step: 45450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:29:23,440-Speed 4762.72 samples/sec Loss 13.6172 Epoch: 9 Global Step: 45500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:29:33,999-Speed 4849.11 samples/sec Loss 13.6849 Epoch: 9 Global Step: 45550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:29:44,518-Speed 4867.85 samples/sec Loss 13.6740 Epoch: 9 Global Step: 45600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:29:54,862-Speed 4949.92 samples/sec Loss 13.6324 Epoch: 9 Global Step: 45650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:30:05,388-Speed 4864.44 samples/sec Loss 13.7112 Epoch: 9 Global Step: 45700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:30:15,770-Speed 4931.85 samples/sec Loss 13.6239 Epoch: 9 Global Step: 45750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:30:26,236-Speed 4892.26 samples/sec Loss 13.7212 Epoch: 9 Global Step: 45800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:30:36,949-Speed 4779.21 samples/sec Loss 13.6815 Epoch: 9 Global Step: 45850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:30:47,671-Speed 4775.43 samples/sec Loss 13.6419 Epoch: 9 Global Step: 45900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:30:58,234-Speed 4847.72 samples/sec Loss 13.6498 Epoch: 9 Global Step: 45950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:31:08,729-Speed 4879.05 samples/sec Loss 13.6311 Epoch: 9 Global Step: 46000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:31:31,891-[lfw][46000]XNorm: 21.622065 Training: 2021-03-17 00:31:31,892-[lfw][46000]Accuracy-Flip: 0.99517+-0.00229 Training: 2021-03-17 00:31:31,892-[lfw][46000]Accuracy-Highest: 0.99583 Training: 2021-03-17 00:31:58,694-[cfp_fp][46000]XNorm: 17.781809 Training: 2021-03-17 00:31:58,695-[cfp_fp][46000]Accuracy-Flip: 0.93757+-0.01114 Training: 2021-03-17 00:31:58,695-[cfp_fp][46000]Accuracy-Highest: 0.94471 Training: 2021-03-17 00:32:21,838-[agedb_30][46000]XNorm: 20.765332 Training: 2021-03-17 00:32:21,838-[agedb_30][46000]Accuracy-Flip: 0.94983+-0.00917 Training: 2021-03-17 00:32:21,838-[agedb_30][46000]Accuracy-Highest: 0.95517 Training: 2021-03-17 00:32:32,021-Speed 614.71 samples/sec Loss 13.5460 Epoch: 9 Global Step: 46050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:32:42,250-Speed 5005.60 samples/sec Loss 13.6838 Epoch: 9 Global Step: 46100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:32:52,813-Speed 4847.49 samples/sec Loss 13.6412 Epoch: 9 Global Step: 46150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:33:03,342-Speed 4863.29 samples/sec Loss 13.7084 Epoch: 9 Global Step: 46200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:33:13,624-Speed 4979.72 samples/sec Loss 13.6684 Epoch: 9 Global Step: 46250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:33:23,921-Speed 4972.76 samples/sec Loss 13.6774 Epoch: 9 Global Step: 46300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:33:34,217-Speed 4972.94 samples/sec Loss 13.7111 Epoch: 9 Global Step: 46350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:33:44,443-Speed 5007.37 samples/sec Loss 13.7317 Epoch: 9 Global Step: 46400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:33:54,769-Speed 4958.77 samples/sec Loss 13.7342 Epoch: 9 Global Step: 46450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:34:05,060-Speed 4975.19 samples/sec Loss 13.7491 Epoch: 9 Global Step: 46500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:34:15,526-Speed 4892.37 samples/sec Loss 13.6995 Epoch: 9 Global Step: 46550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:34:25,852-Speed 4958.73 samples/sec Loss 13.7601 Epoch: 9 Global Step: 46600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:34:36,028-Speed 5031.82 samples/sec Loss 13.7436 Epoch: 9 Global Step: 46650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:34:46,165-Speed 5051.07 samples/sec Loss 13.6332 Epoch: 9 Global Step: 46700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:34:56,618-Speed 4898.52 samples/sec Loss 13.7794 Epoch: 9 Global Step: 46750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:35:07,359-Speed 4767.08 samples/sec Loss 13.7283 Epoch: 9 Global Step: 46800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:35:17,711-Speed 4946.19 samples/sec Loss 13.7386 Epoch: 9 Global Step: 46850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:35:28,117-Speed 4920.82 samples/sec Loss 13.6455 Epoch: 9 Global Step: 46900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:35:38,597-Speed 4885.79 samples/sec Loss 13.6602 Epoch: 9 Global Step: 46950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:35:49,208-Speed 4825.56 samples/sec Loss 13.6071 Epoch: 9 Global Step: 47000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:35:59,709-Speed 4875.68 samples/sec Loss 13.7901 Epoch: 9 Global Step: 47050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:36:10,137-Speed 4910.28 samples/sec Loss 13.7351 Epoch: 9 Global Step: 47100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:36:20,607-Speed 4890.43 samples/sec Loss 13.6640 Epoch: 9 Global Step: 47150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:36:30,950-Speed 4950.63 samples/sec Loss 13.6795 Epoch: 9 Global Step: 47200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:36:41,905-Speed 4673.54 samples/sec Loss 13.5868 Epoch: 9 Global Step: 47250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:36:52,233-Speed 4958.19 samples/sec Loss 13.6852 Epoch: 9 Global Step: 47300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:37:03,057-Speed 4730.35 samples/sec Loss 13.5470 Epoch: 9 Global Step: 47350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:37:13,582-Speed 4864.85 samples/sec Loss 13.6655 Epoch: 9 Global Step: 47400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:37:24,095-Speed 4870.77 samples/sec Loss 13.6542 Epoch: 9 Global Step: 47450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:37:34,419-Speed 4959.56 samples/sec Loss 13.6881 Epoch: 9 Global Step: 47500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:37:45,387-Speed 4668.30 samples/sec Loss 13.7472 Epoch: 9 Global Step: 47550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:37:55,815-Speed 4909.92 samples/sec Loss 13.6770 Epoch: 9 Global Step: 47600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:38:06,397-Speed 4838.45 samples/sec Loss 13.6387 Epoch: 9 Global Step: 47650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:38:16,880-Speed 4884.49 samples/sec Loss 13.7526 Epoch: 9 Global Step: 47700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:38:27,294-Speed 4916.92 samples/sec Loss 13.7474 Epoch: 9 Global Step: 47750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:38:37,652-Speed 4943.02 samples/sec Loss 13.5881 Epoch: 9 Global Step: 47800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:38:48,326-Speed 4796.89 samples/sec Loss 13.7250 Epoch: 9 Global Step: 47850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:38:58,796-Speed 4890.26 samples/sec Loss 13.6995 Epoch: 9 Global Step: 47900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:39:09,412-Speed 4823.37 samples/sec Loss 13.6132 Epoch: 9 Global Step: 47950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:39:20,180-Speed 4754.96 samples/sec Loss 13.7210 Epoch: 9 Global Step: 48000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:39:43,259-[lfw][48000]XNorm: 21.931476 Training: 2021-03-17 00:39:43,260-[lfw][48000]Accuracy-Flip: 0.99400+-0.00309 Training: 2021-03-17 00:39:43,260-[lfw][48000]Accuracy-Highest: 0.99583 Training: 2021-03-17 00:40:10,043-[cfp_fp][48000]XNorm: 18.063676 Training: 2021-03-17 00:40:10,043-[cfp_fp][48000]Accuracy-Flip: 0.92686+-0.01318 Training: 2021-03-17 00:40:10,043-[cfp_fp][48000]Accuracy-Highest: 0.94471 Training: 2021-03-17 00:40:33,187-[agedb_30][48000]XNorm: 21.018747 Training: 2021-03-17 00:40:33,187-[agedb_30][48000]Accuracy-Flip: 0.95683+-0.00685 Training: 2021-03-17 00:40:33,187-[agedb_30][48000]Accuracy-Highest: 0.95683 Training: 2021-03-17 00:40:43,468-Speed 614.73 samples/sec Loss 13.6179 Epoch: 9 Global Step: 48050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:40:53,812-Speed 4950.22 samples/sec Loss 13.8019 Epoch: 9 Global Step: 48100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:41:04,367-Speed 4850.87 samples/sec Loss 13.5910 Epoch: 9 Global Step: 48150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:41:14,882-Speed 4869.27 samples/sec Loss 13.6683 Epoch: 9 Global Step: 48200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:41:25,372-Speed 4881.45 samples/sec Loss 13.6800 Epoch: 9 Global Step: 48250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:41:36,126-Speed 4761.08 samples/sec Loss 13.7890 Epoch: 9 Global Step: 48300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:41:46,519-Speed 4926.56 samples/sec Loss 13.6690 Epoch: 9 Global Step: 48350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:41:56,834-Speed 4963.97 samples/sec Loss 13.6690 Epoch: 9 Global Step: 48400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:42:07,179-Speed 4949.56 samples/sec Loss 13.7466 Epoch: 9 Global Step: 48450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:42:17,799-Speed 4821.40 samples/sec Loss 13.7278 Epoch: 9 Global Step: 48500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:42:28,288-Speed 4881.54 samples/sec Loss 13.7501 Epoch: 9 Global Step: 48550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:42:38,889-Speed 4830.19 samples/sec Loss 13.6369 Epoch: 9 Global Step: 48600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:42:49,483-Speed 4832.96 samples/sec Loss 13.7334 Epoch: 9 Global Step: 48650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:43:00,046-Speed 4847.60 samples/sec Loss 13.6560 Epoch: 9 Global Step: 48700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:43:10,651-Speed 4828.05 samples/sec Loss 13.8041 Epoch: 9 Global Step: 48750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:43:21,403-Speed 4762.14 samples/sec Loss 13.6366 Epoch: 9 Global Step: 48800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:43:32,208-Speed 4738.57 samples/sec Loss 13.6503 Epoch: 9 Global Step: 48850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:43:42,565-Speed 4943.91 samples/sec Loss 13.7061 Epoch: 9 Global Step: 48900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:43:53,333-Speed 4755.17 samples/sec Loss 13.5785 Epoch: 9 Global Step: 48950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:44:03,865-Speed 4861.33 samples/sec Loss 13.6642 Epoch: 9 Global Step: 49000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:44:14,247-Speed 4932.08 samples/sec Loss 13.7233 Epoch: 9 Global Step: 49050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:44:24,631-Speed 4930.89 samples/sec Loss 13.6477 Epoch: 9 Global Step: 49100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:44:35,161-Speed 4862.69 samples/sec Loss 13.7417 Epoch: 9 Global Step: 49150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:44:45,846-Speed 4791.81 samples/sec Loss 13.6823 Epoch: 9 Global Step: 49200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:44:56,329-Speed 4884.69 samples/sec Loss 13.6307 Epoch: 9 Global Step: 49250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:45:06,885-Speed 4850.37 samples/sec Loss 13.6316 Epoch: 9 Global Step: 49300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:45:17,143-Speed 4991.50 samples/sec Loss 13.7131 Epoch: 9 Global Step: 49350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:45:27,753-Speed 4825.97 samples/sec Loss 13.6627 Epoch: 9 Global Step: 49400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:45:38,161-Speed 4919.67 samples/sec Loss 13.5865 Epoch: 9 Global Step: 49450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:45:48,717-Speed 4850.73 samples/sec Loss 13.6352 Epoch: 9 Global Step: 49500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:45:59,196-Speed 4886.25 samples/sec Loss 13.7158 Epoch: 9 Global Step: 49550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:46:10,062-Speed 4711.79 samples/sec Loss 13.6544 Epoch: 9 Global Step: 49600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:46:21,062-Speed 4654.90 samples/sec Loss 13.6950 Epoch: 9 Global Step: 49650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:46:31,477-Speed 4916.20 samples/sec Loss 13.7161 Epoch: 9 Global Step: 49700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:46:42,049-Speed 4843.40 samples/sec Loss 13.5767 Epoch: 9 Global Step: 49750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:46:52,632-Speed 4838.02 samples/sec Loss 13.7448 Epoch: 9 Global Step: 49800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:47:06,057-Speed 3814.03 samples/sec Loss 12.6860 Epoch: 10 Global Step: 49850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:47:16,904-Speed 4720.34 samples/sec Loss 11.1343 Epoch: 10 Global Step: 49900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:47:27,412-Speed 4872.74 samples/sec Loss 10.6723 Epoch: 10 Global Step: 49950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:47:38,482-Speed 4625.61 samples/sec Loss 10.3687 Epoch: 10 Global Step: 50000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:48:01,740-[lfw][50000]XNorm: 23.236707 Training: 2021-03-17 00:48:01,740-[lfw][50000]Accuracy-Flip: 0.99550+-0.00248 Training: 2021-03-17 00:48:01,740-[lfw][50000]Accuracy-Highest: 0.99583 Training: 2021-03-17 00:48:28,468-[cfp_fp][50000]XNorm: 19.491862 Training: 2021-03-17 00:48:28,468-[cfp_fp][50000]Accuracy-Flip: 0.96171+-0.00908 Training: 2021-03-17 00:48:28,468-[cfp_fp][50000]Accuracy-Highest: 0.96171 Training: 2021-03-17 00:48:51,578-[agedb_30][50000]XNorm: 22.543903 Training: 2021-03-17 00:48:51,579-[agedb_30][50000]Accuracy-Flip: 0.96583+-0.00688 Training: 2021-03-17 00:48:51,579-[agedb_30][50000]Accuracy-Highest: 0.96583 Training: 2021-03-17 00:49:02,299-Speed 610.86 samples/sec Loss 10.0520 Epoch: 10 Global Step: 50050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:49:12,849-Speed 4853.24 samples/sec Loss 9.7825 Epoch: 10 Global Step: 50100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:49:23,453-Speed 4828.93 samples/sec Loss 9.5599 Epoch: 10 Global Step: 50150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:49:33,958-Speed 4873.94 samples/sec Loss 9.2907 Epoch: 10 Global Step: 50200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:49:44,568-Speed 4825.79 samples/sec Loss 9.1055 Epoch: 10 Global Step: 50250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:49:55,148-Speed 4839.82 samples/sec Loss 8.8848 Epoch: 10 Global Step: 50300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:50:05,435-Speed 4977.32 samples/sec Loss 8.8283 Epoch: 10 Global Step: 50350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:50:16,170-Speed 4769.63 samples/sec Loss 8.6385 Epoch: 10 Global Step: 50400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:50:26,754-Speed 4837.75 samples/sec Loss 8.5342 Epoch: 10 Global Step: 50450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:50:37,179-Speed 4911.30 samples/sec Loss 8.3554 Epoch: 10 Global Step: 50500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:50:47,610-Speed 4908.86 samples/sec Loss 8.3151 Epoch: 10 Global Step: 50550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:50:58,092-Speed 4884.44 samples/sec Loss 8.1841 Epoch: 10 Global Step: 50600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:51:08,497-Speed 4921.25 samples/sec Loss 8.0816 Epoch: 10 Global Step: 50650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:51:19,054-Speed 4850.13 samples/sec Loss 7.8481 Epoch: 10 Global Step: 50700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:51:29,525-Speed 4889.71 samples/sec Loss 7.7242 Epoch: 10 Global Step: 50750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:51:40,096-Speed 4843.50 samples/sec Loss 7.6577 Epoch: 10 Global Step: 50800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:51:50,696-Speed 4830.63 samples/sec Loss 7.6323 Epoch: 10 Global Step: 50850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:52:01,344-Speed 4808.75 samples/sec Loss 7.5007 Epoch: 10 Global Step: 50900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:52:11,786-Speed 4903.25 samples/sec Loss 7.3695 Epoch: 10 Global Step: 50950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:52:22,240-Speed 4898.09 samples/sec Loss 7.3373 Epoch: 10 Global Step: 51000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:52:32,654-Speed 4916.48 samples/sec Loss 7.2875 Epoch: 10 Global Step: 51050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:52:43,240-Speed 4836.82 samples/sec Loss 7.1737 Epoch: 10 Global Step: 51100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:52:53,786-Speed 4855.48 samples/sec Loss 7.0532 Epoch: 10 Global Step: 51150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:53:04,175-Speed 4928.40 samples/sec Loss 7.0698 Epoch: 10 Global Step: 51200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:53:14,645-Speed 4890.13 samples/sec Loss 6.8964 Epoch: 10 Global Step: 51250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:53:25,153-Speed 4872.99 samples/sec Loss 6.9863 Epoch: 10 Global Step: 51300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:53:35,725-Speed 4842.86 samples/sec Loss 6.7637 Epoch: 10 Global Step: 51350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:53:46,163-Speed 4905.70 samples/sec Loss 6.7283 Epoch: 10 Global Step: 51400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:53:56,518-Speed 4944.65 samples/sec Loss 6.7533 Epoch: 10 Global Step: 51450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:54:07,074-Speed 4850.55 samples/sec Loss 6.6352 Epoch: 10 Global Step: 51500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:54:17,485-Speed 4918.03 samples/sec Loss 6.6125 Epoch: 10 Global Step: 51550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:54:27,951-Speed 4892.38 samples/sec Loss 6.5475 Epoch: 10 Global Step: 51600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:54:38,669-Speed 4777.13 samples/sec Loss 6.4586 Epoch: 10 Global Step: 51650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:54:49,435-Speed 4756.14 samples/sec Loss 6.4341 Epoch: 10 Global Step: 51700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:55:00,073-Speed 4812.99 samples/sec Loss 6.4294 Epoch: 10 Global Step: 51750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:55:10,547-Speed 4888.91 samples/sec Loss 6.3252 Epoch: 10 Global Step: 51800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:55:21,553-Speed 4652.30 samples/sec Loss 6.2964 Epoch: 10 Global Step: 51850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:55:32,284-Speed 4771.16 samples/sec Loss 6.2400 Epoch: 10 Global Step: 51900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:55:42,731-Speed 4901.28 samples/sec Loss 6.2573 Epoch: 10 Global Step: 51950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:55:53,091-Speed 4942.55 samples/sec Loss 6.1980 Epoch: 10 Global Step: 52000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:56:16,226-[lfw][52000]XNorm: 22.046721 Training: 2021-03-17 00:56:16,226-[lfw][52000]Accuracy-Flip: 0.99650+-0.00283 Training: 2021-03-17 00:56:16,226-[lfw][52000]Accuracy-Highest: 0.99650 Training: 2021-03-17 00:56:42,911-[cfp_fp][52000]XNorm: 19.070578 Training: 2021-03-17 00:56:42,912-[cfp_fp][52000]Accuracy-Flip: 0.97271+-0.00914 Training: 2021-03-17 00:56:42,912-[cfp_fp][52000]Accuracy-Highest: 0.97271 Training: 2021-03-17 00:57:06,020-[agedb_30][52000]XNorm: 21.567935 Training: 2021-03-17 00:57:06,020-[agedb_30][52000]Accuracy-Flip: 0.97200+-0.00618 Training: 2021-03-17 00:57:06,020-[agedb_30][52000]Accuracy-Highest: 0.97200 Training: 2021-03-17 00:57:16,412-Speed 614.49 samples/sec Loss 6.1690 Epoch: 10 Global Step: 52050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:57:26,980-Speed 4845.23 samples/sec Loss 6.1374 Epoch: 10 Global Step: 52100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:57:37,352-Speed 4936.36 samples/sec Loss 6.0536 Epoch: 10 Global Step: 52150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:57:47,739-Speed 4929.41 samples/sec Loss 5.9938 Epoch: 10 Global Step: 52200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:57:58,365-Speed 4818.79 samples/sec Loss 6.0648 Epoch: 10 Global Step: 52250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:58:08,671-Speed 4968.56 samples/sec Loss 6.0777 Epoch: 10 Global Step: 52300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:58:19,238-Speed 4845.12 samples/sec Loss 5.9732 Epoch: 10 Global Step: 52350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:58:30,039-Speed 4740.91 samples/sec Loss 5.9313 Epoch: 10 Global Step: 52400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:58:40,346-Speed 4967.38 samples/sec Loss 5.9895 Epoch: 10 Global Step: 52450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:58:50,840-Speed 4879.34 samples/sec Loss 5.8502 Epoch: 10 Global Step: 52500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:59:01,185-Speed 4949.70 samples/sec Loss 5.8918 Epoch: 10 Global Step: 52550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:59:11,523-Speed 4952.93 samples/sec Loss 5.8483 Epoch: 10 Global Step: 52600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:59:21,934-Speed 4918.03 samples/sec Loss 5.7745 Epoch: 10 Global Step: 52650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:59:32,225-Speed 4975.43 samples/sec Loss 5.8063 Epoch: 10 Global Step: 52700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:59:42,785-Speed 4849.11 samples/sec Loss 5.8458 Epoch: 10 Global Step: 52750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 00:59:53,272-Speed 4882.35 samples/sec Loss 5.7618 Epoch: 10 Global Step: 52800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:00:03,945-Speed 4797.34 samples/sec Loss 5.7349 Epoch: 10 Global Step: 52850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:00:14,726-Speed 4749.41 samples/sec Loss 5.7665 Epoch: 10 Global Step: 52900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:00:25,528-Speed 4739.93 samples/sec Loss 5.7121 Epoch: 10 Global Step: 52950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:00:36,021-Speed 4879.43 samples/sec Loss 5.7133 Epoch: 10 Global Step: 53000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:00:46,457-Speed 4906.41 samples/sec Loss 5.7003 Epoch: 10 Global Step: 53050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:00:56,940-Speed 4884.17 samples/sec Loss 5.6981 Epoch: 10 Global Step: 53100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:01:07,645-Speed 4783.32 samples/sec Loss 5.6890 Epoch: 10 Global Step: 53150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:01:18,297-Speed 4806.47 samples/sec Loss 5.6400 Epoch: 10 Global Step: 53200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:01:28,781-Speed 4883.93 samples/sec Loss 5.6059 Epoch: 10 Global Step: 53250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:01:39,336-Speed 4851.04 samples/sec Loss 5.6230 Epoch: 10 Global Step: 53300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:01:49,756-Speed 4913.71 samples/sec Loss 5.5758 Epoch: 10 Global Step: 53350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:02:00,120-Speed 4940.61 samples/sec Loss 5.6323 Epoch: 10 Global Step: 53400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:02:10,524-Speed 4921.26 samples/sec Loss 5.5946 Epoch: 10 Global Step: 53450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:02:20,993-Speed 4891.02 samples/sec Loss 5.5912 Epoch: 10 Global Step: 53500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:02:31,246-Speed 4993.89 samples/sec Loss 5.6191 Epoch: 10 Global Step: 53550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:02:41,740-Speed 4879.39 samples/sec Loss 5.5699 Epoch: 10 Global Step: 53600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:02:52,145-Speed 4921.00 samples/sec Loss 5.5877 Epoch: 10 Global Step: 53650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:03:02,817-Speed 4797.72 samples/sec Loss 5.5519 Epoch: 10 Global Step: 53700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:03:13,246-Speed 4909.84 samples/sec Loss 5.5231 Epoch: 10 Global Step: 53750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:03:23,626-Speed 4932.93 samples/sec Loss 5.5449 Epoch: 10 Global Step: 53800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:03:34,043-Speed 4915.14 samples/sec Loss 5.5415 Epoch: 10 Global Step: 53850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:03:44,928-Speed 4703.96 samples/sec Loss 5.5767 Epoch: 10 Global Step: 53900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:03:55,577-Speed 4808.17 samples/sec Loss 5.5779 Epoch: 10 Global Step: 53950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:04:05,960-Speed 4931.13 samples/sec Loss 5.5441 Epoch: 10 Global Step: 54000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:04:29,057-[lfw][54000]XNorm: 22.941786 Training: 2021-03-17 01:04:29,057-[lfw][54000]Accuracy-Flip: 0.99683+-0.00241 Training: 2021-03-17 01:04:29,057-[lfw][54000]Accuracy-Highest: 0.99683 Training: 2021-03-17 01:04:55,633-[cfp_fp][54000]XNorm: 19.671405 Training: 2021-03-17 01:04:55,633-[cfp_fp][54000]Accuracy-Flip: 0.97543+-0.00469 Training: 2021-03-17 01:04:55,633-[cfp_fp][54000]Accuracy-Highest: 0.97543 Training: 2021-03-17 01:05:18,590-[agedb_30][54000]XNorm: 22.344280 Training: 2021-03-17 01:05:18,590-[agedb_30][54000]Accuracy-Flip: 0.97483+-0.00589 Training: 2021-03-17 01:05:18,590-[agedb_30][54000]Accuracy-Highest: 0.97483 Training: 2021-03-17 01:05:29,260-Speed 614.66 samples/sec Loss 5.5605 Epoch: 10 Global Step: 54050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:05:39,951-Speed 4789.36 samples/sec Loss 5.5178 Epoch: 10 Global Step: 54100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:05:50,672-Speed 4775.84 samples/sec Loss 5.5112 Epoch: 10 Global Step: 54150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:06:01,357-Speed 4791.98 samples/sec Loss 5.5218 Epoch: 10 Global Step: 54200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:06:11,794-Speed 4906.12 samples/sec Loss 5.4840 Epoch: 10 Global Step: 54250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:06:22,365-Speed 4843.43 samples/sec Loss 5.5097 Epoch: 10 Global Step: 54300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:06:32,827-Speed 4894.44 samples/sec Loss 5.4828 Epoch: 10 Global Step: 54350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:06:43,519-Speed 4788.68 samples/sec Loss 5.5642 Epoch: 10 Global Step: 54400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:06:53,978-Speed 4895.46 samples/sec Loss 5.4769 Epoch: 10 Global Step: 54450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:07:04,549-Speed 4843.62 samples/sec Loss 5.5115 Epoch: 10 Global Step: 54500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:07:15,009-Speed 4895.13 samples/sec Loss 5.5158 Epoch: 10 Global Step: 54550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:07:25,401-Speed 4926.88 samples/sec Loss 5.4865 Epoch: 10 Global Step: 54600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:07:35,828-Speed 4910.79 samples/sec Loss 5.5083 Epoch: 10 Global Step: 54650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:07:46,279-Speed 4899.54 samples/sec Loss 5.5216 Epoch: 10 Global Step: 54700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:07:56,661-Speed 4931.49 samples/sec Loss 5.4210 Epoch: 10 Global Step: 54750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:08:07,371-Speed 4780.72 samples/sec Loss 5.5087 Epoch: 10 Global Step: 54800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:08:20,626-Speed 3863.11 samples/sec Loss 4.6568 Epoch: 11 Global Step: 54850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:08:31,138-Speed 4870.95 samples/sec Loss 4.6689 Epoch: 11 Global Step: 54900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:08:41,928-Speed 4745.34 samples/sec Loss 4.7120 Epoch: 11 Global Step: 54950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:08:52,492-Speed 4846.69 samples/sec Loss 4.7214 Epoch: 11 Global Step: 55000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:09:03,113-Speed 4821.11 samples/sec Loss 4.7662 Epoch: 11 Global Step: 55050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:09:13,684-Speed 4843.89 samples/sec Loss 4.7652 Epoch: 11 Global Step: 55100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:09:24,283-Speed 4831.05 samples/sec Loss 4.7994 Epoch: 11 Global Step: 55150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:09:34,888-Speed 4828.20 samples/sec Loss 4.8677 Epoch: 11 Global Step: 55200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:09:45,546-Speed 4803.94 samples/sec Loss 4.9227 Epoch: 11 Global Step: 55250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:09:55,903-Speed 4943.78 samples/sec Loss 4.9616 Epoch: 11 Global Step: 55300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:10:06,445-Speed 4857.04 samples/sec Loss 4.9483 Epoch: 11 Global Step: 55350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:10:17,012-Speed 4845.79 samples/sec Loss 4.9949 Epoch: 11 Global Step: 55400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:10:27,660-Speed 4808.38 samples/sec Loss 5.0141 Epoch: 11 Global Step: 55450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:10:38,154-Speed 4879.32 samples/sec Loss 4.9939 Epoch: 11 Global Step: 55500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:10:48,796-Speed 4811.39 samples/sec Loss 4.9816 Epoch: 11 Global Step: 55550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:10:59,421-Speed 4818.88 samples/sec Loss 5.0859 Epoch: 11 Global Step: 55600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:11:09,897-Speed 4887.36 samples/sec Loss 5.0944 Epoch: 11 Global Step: 55650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:11:20,293-Speed 4925.57 samples/sec Loss 5.0728 Epoch: 11 Global Step: 55700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:11:30,911-Speed 4822.04 samples/sec Loss 5.1492 Epoch: 11 Global Step: 55750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:11:41,274-Speed 4941.10 samples/sec Loss 5.0931 Epoch: 11 Global Step: 55800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:11:51,923-Speed 4808.24 samples/sec Loss 5.1569 Epoch: 11 Global Step: 55850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:12:02,493-Speed 4843.94 samples/sec Loss 5.1877 Epoch: 11 Global Step: 55900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:12:13,148-Speed 4805.31 samples/sec Loss 5.2353 Epoch: 11 Global Step: 55950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:12:23,898-Speed 4763.04 samples/sec Loss 5.1758 Epoch: 11 Global Step: 56000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:12:46,887-[lfw][56000]XNorm: 23.156473 Training: 2021-03-17 01:12:46,887-[lfw][56000]Accuracy-Flip: 0.99633+-0.00348 Training: 2021-03-17 01:12:46,887-[lfw][56000]Accuracy-Highest: 0.99683 Training: 2021-03-17 01:13:13,641-[cfp_fp][56000]XNorm: 20.137545 Training: 2021-03-17 01:13:13,641-[cfp_fp][56000]Accuracy-Flip: 0.97714+-0.00433 Training: 2021-03-17 01:13:13,641-[cfp_fp][56000]Accuracy-Highest: 0.97714 Training: 2021-03-17 01:13:36,600-[agedb_30][56000]XNorm: 22.730009 Training: 2021-03-17 01:13:36,600-[agedb_30][56000]Accuracy-Flip: 0.97383+-0.00687 Training: 2021-03-17 01:13:36,600-[agedb_30][56000]Accuracy-Highest: 0.97483 Training: 2021-03-17 01:13:47,341-Speed 613.60 samples/sec Loss 5.2729 Epoch: 11 Global Step: 56050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:13:57,862-Speed 4866.61 samples/sec Loss 5.1808 Epoch: 11 Global Step: 56100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:14:08,874-Speed 4649.50 samples/sec Loss 5.2596 Epoch: 11 Global Step: 56150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:14:19,487-Speed 4824.58 samples/sec Loss 5.3040 Epoch: 11 Global Step: 56200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:14:30,794-Speed 4528.12 samples/sec Loss 5.2902 Epoch: 11 Global Step: 56250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:14:41,399-Speed 4828.17 samples/sec Loss 5.2610 Epoch: 11 Global Step: 56300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:14:52,012-Speed 4824.65 samples/sec Loss 5.2974 Epoch: 11 Global Step: 56350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:15:02,788-Speed 4751.31 samples/sec Loss 5.3720 Epoch: 11 Global Step: 56400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:15:13,757-Speed 4668.00 samples/sec Loss 5.3900 Epoch: 11 Global Step: 56450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:15:24,771-Speed 4648.80 samples/sec Loss 5.3879 Epoch: 11 Global Step: 56500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:15:35,452-Speed 4793.67 samples/sec Loss 5.3569 Epoch: 11 Global Step: 56550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:15:46,268-Speed 4734.14 samples/sec Loss 5.3547 Epoch: 11 Global Step: 56600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:15:57,030-Speed 4757.72 samples/sec Loss 5.3792 Epoch: 11 Global Step: 56650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:16:07,773-Speed 4766.15 samples/sec Loss 5.3827 Epoch: 11 Global Step: 56700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:16:18,314-Speed 4857.56 samples/sec Loss 5.3642 Epoch: 11 Global Step: 56750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:16:29,082-Speed 4754.96 samples/sec Loss 5.4370 Epoch: 11 Global Step: 56800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:16:39,941-Speed 4715.07 samples/sec Loss 5.4153 Epoch: 11 Global Step: 56850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:16:50,413-Speed 4889.26 samples/sec Loss 5.3537 Epoch: 11 Global Step: 56900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:17:00,918-Speed 4874.33 samples/sec Loss 5.4736 Epoch: 11 Global Step: 56950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:17:11,448-Speed 4862.78 samples/sec Loss 5.4532 Epoch: 11 Global Step: 57000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:17:22,408-Speed 4671.40 samples/sec Loss 5.4650 Epoch: 11 Global Step: 57050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:17:32,897-Speed 4881.82 samples/sec Loss 5.4869 Epoch: 11 Global Step: 57100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:17:43,667-Speed 4754.15 samples/sec Loss 5.4327 Epoch: 11 Global Step: 57150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:17:53,974-Speed 4967.49 samples/sec Loss 5.5153 Epoch: 11 Global Step: 57200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:18:04,700-Speed 4773.66 samples/sec Loss 5.5434 Epoch: 11 Global Step: 57250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:18:15,187-Speed 4882.55 samples/sec Loss 5.4526 Epoch: 11 Global Step: 57300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:18:25,621-Speed 4907.09 samples/sec Loss 5.5107 Epoch: 11 Global Step: 57350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:18:36,329-Speed 4781.58 samples/sec Loss 5.4971 Epoch: 11 Global Step: 57400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:18:47,035-Speed 4782.64 samples/sec Loss 5.5728 Epoch: 11 Global Step: 57450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:18:57,703-Speed 4799.49 samples/sec Loss 5.5161 Epoch: 11 Global Step: 57500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:19:08,203-Speed 4876.44 samples/sec Loss 5.5227 Epoch: 11 Global Step: 57550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:19:18,793-Speed 4834.85 samples/sec Loss 5.5683 Epoch: 11 Global Step: 57600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:19:29,364-Speed 4843.95 samples/sec Loss 5.5992 Epoch: 11 Global Step: 57650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:19:40,147-Speed 4748.17 samples/sec Loss 5.5905 Epoch: 11 Global Step: 57700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:19:50,757-Speed 4826.10 samples/sec Loss 5.5633 Epoch: 11 Global Step: 57750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:20:01,505-Speed 4763.87 samples/sec Loss 5.5752 Epoch: 11 Global Step: 57800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:20:12,118-Speed 4824.42 samples/sec Loss 5.5736 Epoch: 11 Global Step: 57850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:20:22,677-Speed 4849.14 samples/sec Loss 5.5467 Epoch: 11 Global Step: 57900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:20:33,222-Speed 4855.74 samples/sec Loss 5.5782 Epoch: 11 Global Step: 57950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:20:43,843-Speed 4820.68 samples/sec Loss 5.6055 Epoch: 11 Global Step: 58000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:21:07,113-[lfw][58000]XNorm: 23.193505 Training: 2021-03-17 01:21:07,114-[lfw][58000]Accuracy-Flip: 0.99667+-0.00269 Training: 2021-03-17 01:21:07,114-[lfw][58000]Accuracy-Highest: 0.99683 Training: 2021-03-17 01:21:33,896-[cfp_fp][58000]XNorm: 20.144858 Training: 2021-03-17 01:21:33,896-[cfp_fp][58000]Accuracy-Flip: 0.97543+-0.00638 Training: 2021-03-17 01:21:33,896-[cfp_fp][58000]Accuracy-Highest: 0.97714 Training: 2021-03-17 01:21:57,077-[agedb_30][58000]XNorm: 22.614838 Training: 2021-03-17 01:21:57,077-[agedb_30][58000]Accuracy-Flip: 0.97550+-0.00715 Training: 2021-03-17 01:21:57,077-[agedb_30][58000]Accuracy-Highest: 0.97550 Training: 2021-03-17 01:22:07,465-Speed 612.29 samples/sec Loss 5.6365 Epoch: 11 Global Step: 58050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:22:17,981-Speed 4868.64 samples/sec Loss 5.6460 Epoch: 11 Global Step: 58100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:22:28,575-Speed 4833.17 samples/sec Loss 5.6043 Epoch: 11 Global Step: 58150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:22:39,493-Speed 4689.61 samples/sec Loss 5.6125 Epoch: 11 Global Step: 58200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:22:50,347-Speed 4717.63 samples/sec Loss 5.6562 Epoch: 11 Global Step: 58250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:23:00,947-Speed 4830.26 samples/sec Loss 5.6118 Epoch: 11 Global Step: 58300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:23:11,804-Speed 4716.29 samples/sec Loss 5.6167 Epoch: 11 Global Step: 58350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:23:22,352-Speed 4853.82 samples/sec Loss 5.6634 Epoch: 11 Global Step: 58400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:23:33,444-Speed 4616.36 samples/sec Loss 5.6318 Epoch: 11 Global Step: 58450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:23:43,905-Speed 4894.40 samples/sec Loss 5.7192 Epoch: 11 Global Step: 58500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:23:54,577-Speed 4797.82 samples/sec Loss 5.6391 Epoch: 11 Global Step: 58550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:24:05,453-Speed 4707.70 samples/sec Loss 5.6532 Epoch: 11 Global Step: 58600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:24:16,281-Speed 4728.65 samples/sec Loss 5.6627 Epoch: 11 Global Step: 58650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:24:27,040-Speed 4759.37 samples/sec Loss 5.6823 Epoch: 11 Global Step: 58700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:24:37,839-Speed 4741.23 samples/sec Loss 5.7046 Epoch: 11 Global Step: 58750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:24:48,305-Speed 4892.41 samples/sec Loss 5.6321 Epoch: 11 Global Step: 58800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:24:58,845-Speed 4857.70 samples/sec Loss 5.6800 Epoch: 11 Global Step: 58850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:25:09,400-Speed 4850.82 samples/sec Loss 5.6352 Epoch: 11 Global Step: 58900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:25:19,957-Speed 4850.45 samples/sec Loss 5.7105 Epoch: 11 Global Step: 58950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:25:30,502-Speed 4855.29 samples/sec Loss 5.7530 Epoch: 11 Global Step: 59000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:25:41,196-Speed 4788.16 samples/sec Loss 5.7416 Epoch: 11 Global Step: 59050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2021-03-17 01:25:51,714-Speed 4867.88 samples/sec Loss 5.6317 Epoch: 11 Global Step: 59100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:26:02,405-Speed 4789.30 samples/sec Loss 5.7116 Epoch: 11 Global Step: 59150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:26:13,253-Speed 4720.15 samples/sec Loss 5.8271 Epoch: 11 Global Step: 59200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:26:23,891-Speed 4813.16 samples/sec Loss 5.7355 Epoch: 11 Global Step: 59250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:26:34,663-Speed 4753.29 samples/sec Loss 5.7692 Epoch: 11 Global Step: 59300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:26:45,598-Speed 4682.23 samples/sec Loss 5.7373 Epoch: 11 Global Step: 59350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:26:56,240-Speed 4811.46 samples/sec Loss 5.7445 Epoch: 11 Global Step: 59400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:27:07,032-Speed 4744.19 samples/sec Loss 5.7415 Epoch: 11 Global Step: 59450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:27:17,674-Speed 4811.45 samples/sec Loss 5.7325 Epoch: 11 Global Step: 59500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:27:28,562-Speed 4702.49 samples/sec Loss 5.7087 Epoch: 11 Global Step: 59550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:27:39,248-Speed 4791.55 samples/sec Loss 5.7567 Epoch: 11 Global Step: 59600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:27:49,856-Speed 4826.74 samples/sec Loss 5.7549 Epoch: 11 Global Step: 59650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:28:00,432-Speed 4841.44 samples/sec Loss 5.7953 Epoch: 11 Global Step: 59700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:28:11,051-Speed 4822.01 samples/sec Loss 5.7881 Epoch: 11 Global Step: 59750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:28:24,389-Speed 3838.63 samples/sec Loss 5.4505 Epoch: 12 Global Step: 59800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:28:34,967-Speed 4840.76 samples/sec Loss 4.8852 Epoch: 12 Global Step: 59850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:28:45,516-Speed 4853.68 samples/sec Loss 4.9792 Epoch: 12 Global Step: 59900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:28:56,030-Speed 4869.94 samples/sec Loss 4.9935 Epoch: 12 Global Step: 59950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:29:07,048-Speed 4647.33 samples/sec Loss 5.0331 Epoch: 12 Global Step: 60000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:29:30,429-[lfw][60000]XNorm: 23.142008 Training: 2021-03-17 01:29:30,430-[lfw][60000]Accuracy-Flip: 0.99650+-0.00376 Training: 2021-03-17 01:29:30,430-[lfw][60000]Accuracy-Highest: 0.99683 Training: 2021-03-17 01:29:57,263-[cfp_fp][60000]XNorm: 20.019797 Training: 2021-03-17 01:29:57,264-[cfp_fp][60000]Accuracy-Flip: 0.97414+-0.00659 Training: 2021-03-17 01:29:57,264-[cfp_fp][60000]Accuracy-Highest: 0.97714 Training: 2021-03-17 01:30:20,399-[agedb_30][60000]XNorm: 22.575904 Training: 2021-03-17 01:30:20,400-[agedb_30][60000]Accuracy-Flip: 0.97500+-0.00650 Training: 2021-03-17 01:30:20,400-[agedb_30][60000]Accuracy-Highest: 0.97550 Training: 2021-03-17 01:30:30,982-Speed 610.01 samples/sec Loss 5.1295 Epoch: 12 Global Step: 60050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:30:41,431-Speed 4900.27 samples/sec Loss 5.1241 Epoch: 12 Global Step: 60100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:30:52,079-Speed 4808.58 samples/sec Loss 5.1545 Epoch: 12 Global Step: 60150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:31:02,987-Speed 4694.11 samples/sec Loss 5.2178 Epoch: 12 Global Step: 60200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:31:13,731-Speed 4765.34 samples/sec Loss 5.2219 Epoch: 12 Global Step: 60250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:31:24,251-Speed 4867.36 samples/sec Loss 5.1844 Epoch: 12 Global Step: 60300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:31:35,072-Speed 4731.62 samples/sec Loss 5.3008 Epoch: 12 Global Step: 60350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:31:45,953-Speed 4705.59 samples/sec Loss 5.2767 Epoch: 12 Global Step: 60400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:31:56,859-Speed 4694.90 samples/sec Loss 5.2785 Epoch: 12 Global Step: 60450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:32:07,517-Speed 4804.17 samples/sec Loss 5.2983 Epoch: 12 Global Step: 60500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:32:18,888-Speed 4503.01 samples/sec Loss 5.3726 Epoch: 12 Global Step: 60550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:32:29,447-Speed 4849.19 samples/sec Loss 5.4465 Epoch: 12 Global Step: 60600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:32:41,072-Speed 4404.19 samples/sec Loss 5.4169 Epoch: 12 Global Step: 60650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:32:52,372-Speed 4531.30 samples/sec Loss 5.4262 Epoch: 12 Global Step: 60700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:33:03,103-Speed 4771.42 samples/sec Loss 5.4452 Epoch: 12 Global Step: 60750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:33:13,979-Speed 4707.99 samples/sec Loss 5.4818 Epoch: 12 Global Step: 60800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:33:24,597-Speed 4821.99 samples/sec Loss 5.4657 Epoch: 12 Global Step: 60850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:33:35,149-Speed 4852.58 samples/sec Loss 5.5351 Epoch: 12 Global Step: 60900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:33:45,732-Speed 4838.20 samples/sec Loss 5.4819 Epoch: 12 Global Step: 60950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:33:56,154-Speed 4912.87 samples/sec Loss 5.6004 Epoch: 12 Global Step: 61000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:34:06,711-Speed 4849.88 samples/sec Loss 5.5464 Epoch: 12 Global Step: 61050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:34:17,431-Speed 4776.34 samples/sec Loss 5.5807 Epoch: 12 Global Step: 61100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:34:27,814-Speed 4931.23 samples/sec Loss 5.5660 Epoch: 12 Global Step: 61150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:34:38,521-Speed 4782.39 samples/sec Loss 5.5552 Epoch: 12 Global Step: 61200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:34:49,195-Speed 4796.60 samples/sec Loss 5.6380 Epoch: 12 Global Step: 61250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:34:59,991-Speed 4743.13 samples/sec Loss 5.6482 Epoch: 12 Global Step: 61300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:35:10,622-Speed 4816.16 samples/sec Loss 5.6246 Epoch: 12 Global Step: 61350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:35:21,250-Speed 4817.60 samples/sec Loss 5.6278 Epoch: 12 Global Step: 61400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:35:32,162-Speed 4692.20 samples/sec Loss 5.6820 Epoch: 12 Global Step: 61450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:35:42,694-Speed 4861.38 samples/sec Loss 5.6547 Epoch: 12 Global Step: 61500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:35:53,261-Speed 4845.87 samples/sec Loss 5.6955 Epoch: 12 Global Step: 61550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:36:03,907-Speed 4809.41 samples/sec Loss 5.6297 Epoch: 12 Global Step: 61600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:36:14,756-Speed 4719.65 samples/sec Loss 5.6813 Epoch: 12 Global Step: 61650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:36:25,440-Speed 4792.40 samples/sec Loss 5.6274 Epoch: 12 Global Step: 61700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:36:36,147-Speed 4782.06 samples/sec Loss 5.6757 Epoch: 12 Global Step: 61750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:36:46,755-Speed 4826.40 samples/sec Loss 5.6863 Epoch: 12 Global Step: 61800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:36:57,527-Speed 4753.30 samples/sec Loss 5.6345 Epoch: 12 Global Step: 61850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:37:08,139-Speed 4825.14 samples/sec Loss 5.6843 Epoch: 12 Global Step: 61900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:37:18,891-Speed 4761.84 samples/sec Loss 5.7031 Epoch: 12 Global Step: 61950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:37:29,586-Speed 4787.84 samples/sec Loss 5.7043 Epoch: 12 Global Step: 62000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:37:52,745-[lfw][62000]XNorm: 22.346645 Training: 2021-03-17 01:37:52,745-[lfw][62000]Accuracy-Flip: 0.99650+-0.00353 Training: 2021-03-17 01:37:52,745-[lfw][62000]Accuracy-Highest: 0.99683 Training: 2021-03-17 01:38:19,575-[cfp_fp][62000]XNorm: 19.654976 Training: 2021-03-17 01:38:19,575-[cfp_fp][62000]Accuracy-Flip: 0.97257+-0.00645 Training: 2021-03-17 01:38:19,575-[cfp_fp][62000]Accuracy-Highest: 0.97714 Training: 2021-03-17 01:38:42,705-[agedb_30][62000]XNorm: 21.781502 Training: 2021-03-17 01:38:42,705-[agedb_30][62000]Accuracy-Flip: 0.97283+-0.00582 Training: 2021-03-17 01:38:42,705-[agedb_30][62000]Accuracy-Highest: 0.97550 Training: 2021-03-17 01:38:53,268-Speed 611.84 samples/sec Loss 5.7278 Epoch: 12 Global Step: 62050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:39:04,015-Speed 4764.41 samples/sec Loss 5.7304 Epoch: 12 Global Step: 62100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:39:14,585-Speed 4844.55 samples/sec Loss 5.7896 Epoch: 12 Global Step: 62150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:39:24,986-Speed 4922.39 samples/sec Loss 5.7698 Epoch: 12 Global Step: 62200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:39:35,548-Speed 4847.93 samples/sec Loss 5.8187 Epoch: 12 Global Step: 62250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:39:46,258-Speed 4780.72 samples/sec Loss 5.7996 Epoch: 12 Global Step: 62300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:39:56,875-Speed 4822.56 samples/sec Loss 5.8153 Epoch: 12 Global Step: 62350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:40:07,707-Speed 4727.14 samples/sec Loss 5.7879 Epoch: 12 Global Step: 62400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:40:18,469-Speed 4757.80 samples/sec Loss 5.7518 Epoch: 12 Global Step: 62450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:40:29,022-Speed 4851.48 samples/sec Loss 5.7117 Epoch: 12 Global Step: 62500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:40:40,038-Speed 4648.20 samples/sec Loss 5.7773 Epoch: 12 Global Step: 62550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:40:50,775-Speed 4768.78 samples/sec Loss 5.8081 Epoch: 12 Global Step: 62600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:41:01,436-Speed 4802.90 samples/sec Loss 5.8006 Epoch: 12 Global Step: 62650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:41:12,570-Speed 4598.38 samples/sec Loss 5.7836 Epoch: 12 Global Step: 62700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:41:23,208-Speed 4813.37 samples/sec Loss 5.8094 Epoch: 12 Global Step: 62750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:41:33,993-Speed 4747.28 samples/sec Loss 5.8326 Epoch: 12 Global Step: 62800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:41:45,268-Speed 4541.64 samples/sec Loss 5.8677 Epoch: 12 Global Step: 62850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:41:55,881-Speed 4824.11 samples/sec Loss 5.7923 Epoch: 12 Global Step: 62900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:42:06,700-Speed 4732.63 samples/sec Loss 5.8208 Epoch: 12 Global Step: 62950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:42:17,652-Speed 4675.33 samples/sec Loss 5.7479 Epoch: 12 Global Step: 63000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:42:28,337-Speed 4792.00 samples/sec Loss 5.8034 Epoch: 12 Global Step: 63050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:42:39,233-Speed 4699.09 samples/sec Loss 5.8487 Epoch: 12 Global Step: 63100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:42:49,976-Speed 4766.05 samples/sec Loss 5.7863 Epoch: 12 Global Step: 63150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:43:00,879-Speed 4696.31 samples/sec Loss 5.7947 Epoch: 12 Global Step: 63200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:43:11,850-Speed 4666.93 samples/sec Loss 5.7665 Epoch: 12 Global Step: 63250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:43:22,400-Speed 4853.23 samples/sec Loss 5.8025 Epoch: 12 Global Step: 63300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:43:33,079-Speed 4794.93 samples/sec Loss 5.8706 Epoch: 12 Global Step: 63350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:43:43,797-Speed 4777.21 samples/sec Loss 5.8180 Epoch: 12 Global Step: 63400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:43:54,300-Speed 4874.99 samples/sec Loss 5.8196 Epoch: 12 Global Step: 63450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:44:05,168-Speed 4711.15 samples/sec Loss 5.8212 Epoch: 12 Global Step: 63500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:44:16,078-Speed 4693.27 samples/sec Loss 5.8400 Epoch: 12 Global Step: 63550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:44:26,748-Speed 4798.69 samples/sec Loss 5.7969 Epoch: 12 Global Step: 63600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:44:37,837-Speed 4617.52 samples/sec Loss 5.8716 Epoch: 12 Global Step: 63650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:44:48,752-Speed 4690.81 samples/sec Loss 5.9346 Epoch: 12 Global Step: 63700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:44:59,403-Speed 4807.29 samples/sec Loss 5.8091 Epoch: 12 Global Step: 63750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:45:10,233-Speed 4727.86 samples/sec Loss 5.8687 Epoch: 12 Global Step: 63800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:45:20,950-Speed 4777.39 samples/sec Loss 5.8123 Epoch: 12 Global Step: 63850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:45:31,753-Speed 4739.79 samples/sec Loss 5.8652 Epoch: 12 Global Step: 63900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:45:42,361-Speed 4826.72 samples/sec Loss 5.8033 Epoch: 12 Global Step: 63950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:45:53,130-Speed 4754.52 samples/sec Loss 5.8402 Epoch: 12 Global Step: 64000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:46:16,239-[lfw][64000]XNorm: 23.036892 Training: 2021-03-17 01:46:16,240-[lfw][64000]Accuracy-Flip: 0.99700+-0.00306 Training: 2021-03-17 01:46:16,240-[lfw][64000]Accuracy-Highest: 0.99700 Training: 2021-03-17 01:46:43,032-[cfp_fp][64000]XNorm: 19.942577 Training: 2021-03-17 01:46:43,032-[cfp_fp][64000]Accuracy-Flip: 0.97786+-0.00674 Training: 2021-03-17 01:46:43,032-[cfp_fp][64000]Accuracy-Highest: 0.97786 Training: 2021-03-17 01:47:06,211-[agedb_30][64000]XNorm: 22.383557 Training: 2021-03-17 01:47:06,211-[agedb_30][64000]Accuracy-Flip: 0.97633+-0.00702 Training: 2021-03-17 01:47:06,211-[agedb_30][64000]Accuracy-Highest: 0.97633 Training: 2021-03-17 01:47:16,869-Speed 611.43 samples/sec Loss 5.8135 Epoch: 12 Global Step: 64050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:47:27,558-Speed 4790.26 samples/sec Loss 5.8005 Epoch: 12 Global Step: 64100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:47:38,405-Speed 4720.38 samples/sec Loss 5.8232 Epoch: 12 Global Step: 64150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:47:49,051-Speed 4809.35 samples/sec Loss 5.8807 Epoch: 12 Global Step: 64200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:47:59,919-Speed 4711.13 samples/sec Loss 5.8772 Epoch: 12 Global Step: 64250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:48:10,814-Speed 4699.56 samples/sec Loss 5.7983 Epoch: 12 Global Step: 64300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:48:21,540-Speed 4773.89 samples/sec Loss 5.8639 Epoch: 12 Global Step: 64350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:48:32,044-Speed 4874.51 samples/sec Loss 5.8702 Epoch: 12 Global Step: 64400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:48:42,949-Speed 4695.17 samples/sec Loss 5.8792 Epoch: 12 Global Step: 64450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:48:53,576-Speed 4818.21 samples/sec Loss 5.8563 Epoch: 12 Global Step: 64500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:49:04,414-Speed 4724.41 samples/sec Loss 5.8634 Epoch: 12 Global Step: 64550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:49:15,152-Speed 4768.24 samples/sec Loss 5.8416 Epoch: 12 Global Step: 64600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:49:25,735-Speed 4838.01 samples/sec Loss 5.8663 Epoch: 12 Global Step: 64650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:49:36,381-Speed 4809.63 samples/sec Loss 5.8732 Epoch: 12 Global Step: 64700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:49:46,999-Speed 4822.16 samples/sec Loss 5.8620 Epoch: 12 Global Step: 64750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:50:01,005-Speed 3655.79 samples/sec Loss 5.3301 Epoch: 13 Global Step: 64800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:50:11,967-Speed 4671.17 samples/sec Loss 5.0237 Epoch: 13 Global Step: 64850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:50:23,094-Speed 4601.76 samples/sec Loss 5.0809 Epoch: 13 Global Step: 64900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:50:33,647-Speed 4851.54 samples/sec Loss 5.1057 Epoch: 13 Global Step: 64950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:50:44,601-Speed 4674.30 samples/sec Loss 5.1415 Epoch: 13 Global Step: 65000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:50:55,998-Speed 4492.85 samples/sec Loss 5.1958 Epoch: 13 Global Step: 65050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:51:06,663-Speed 4800.73 samples/sec Loss 5.2128 Epoch: 13 Global Step: 65100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:51:17,405-Speed 4766.76 samples/sec Loss 5.2605 Epoch: 13 Global Step: 65150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:51:28,034-Speed 4817.09 samples/sec Loss 5.3036 Epoch: 13 Global Step: 65200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:51:38,536-Speed 4875.55 samples/sec Loss 5.3372 Epoch: 13 Global Step: 65250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:51:49,161-Speed 4818.87 samples/sec Loss 5.3623 Epoch: 13 Global Step: 65300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:51:59,893-Speed 4770.97 samples/sec Loss 5.3932 Epoch: 13 Global Step: 65350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:52:10,859-Speed 4669.47 samples/sec Loss 5.4476 Epoch: 13 Global Step: 65400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:52:21,601-Speed 4766.45 samples/sec Loss 5.4481 Epoch: 13 Global Step: 65450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:52:32,537-Speed 4681.76 samples/sec Loss 5.4902 Epoch: 13 Global Step: 65500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:52:43,567-Speed 4642.15 samples/sec Loss 5.5077 Epoch: 13 Global Step: 65550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:52:54,124-Speed 4850.30 samples/sec Loss 5.5299 Epoch: 13 Global Step: 65600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:53:04,785-Speed 4802.48 samples/sec Loss 5.5087 Epoch: 13 Global Step: 65650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:53:15,277-Speed 4880.17 samples/sec Loss 5.5423 Epoch: 13 Global Step: 65700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:53:25,867-Speed 4834.87 samples/sec Loss 5.5406 Epoch: 13 Global Step: 65750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:53:36,399-Speed 4861.79 samples/sec Loss 5.5915 Epoch: 13 Global Step: 65800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:53:47,135-Speed 4769.19 samples/sec Loss 5.5843 Epoch: 13 Global Step: 65850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:53:57,900-Speed 4756.50 samples/sec Loss 5.5556 Epoch: 13 Global Step: 65900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:54:08,673-Speed 4752.93 samples/sec Loss 5.6207 Epoch: 13 Global Step: 65950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:54:19,288-Speed 4823.25 samples/sec Loss 5.6269 Epoch: 13 Global Step: 66000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:54:42,692-[lfw][66000]XNorm: 23.059340 Training: 2021-03-17 01:54:42,692-[lfw][66000]Accuracy-Flip: 0.99633+-0.00314 Training: 2021-03-17 01:54:42,692-[lfw][66000]Accuracy-Highest: 0.99700 Training: 2021-03-17 01:55:09,387-[cfp_fp][66000]XNorm: 20.089524 Training: 2021-03-17 01:55:09,388-[cfp_fp][66000]Accuracy-Flip: 0.97743+-0.00745 Training: 2021-03-17 01:55:09,388-[cfp_fp][66000]Accuracy-Highest: 0.97786 Training: 2021-03-17 01:55:32,403-[agedb_30][66000]XNorm: 22.410278 Training: 2021-03-17 01:55:32,403-[agedb_30][66000]Accuracy-Flip: 0.97500+-0.00553 Training: 2021-03-17 01:55:32,403-[agedb_30][66000]Accuracy-Highest: 0.97633 Training: 2021-03-17 01:55:43,048-Speed 611.27 samples/sec Loss 5.6236 Epoch: 13 Global Step: 66050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:55:53,829-Speed 4749.66 samples/sec Loss 5.6422 Epoch: 13 Global Step: 66100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:56:04,348-Speed 4867.47 samples/sec Loss 5.6875 Epoch: 13 Global Step: 66150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:56:15,229-Speed 4705.32 samples/sec Loss 5.6807 Epoch: 13 Global Step: 66200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:56:25,978-Speed 4763.67 samples/sec Loss 5.7095 Epoch: 13 Global Step: 66250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:56:36,587-Speed 4826.32 samples/sec Loss 5.6174 Epoch: 13 Global Step: 66300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:56:47,373-Speed 4746.89 samples/sec Loss 5.6785 Epoch: 13 Global Step: 66350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:56:57,876-Speed 4875.33 samples/sec Loss 5.7326 Epoch: 13 Global Step: 66400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:57:08,800-Speed 4687.08 samples/sec Loss 5.7124 Epoch: 13 Global Step: 66450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:57:19,355-Speed 4850.65 samples/sec Loss 5.7410 Epoch: 13 Global Step: 66500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:57:30,261-Speed 4694.94 samples/sec Loss 5.6761 Epoch: 13 Global Step: 66550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:57:40,949-Speed 4790.78 samples/sec Loss 5.7474 Epoch: 13 Global Step: 66600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:57:51,786-Speed 4724.84 samples/sec Loss 5.7916 Epoch: 13 Global Step: 66650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:58:02,647-Speed 4714.26 samples/sec Loss 5.7321 Epoch: 13 Global Step: 66700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:58:13,551-Speed 4695.65 samples/sec Loss 5.7246 Epoch: 13 Global Step: 66750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:58:24,065-Speed 4870.12 samples/sec Loss 5.7218 Epoch: 13 Global Step: 66800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:58:34,767-Speed 4784.02 samples/sec Loss 5.6922 Epoch: 13 Global Step: 66850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:58:45,646-Speed 4706.53 samples/sec Loss 5.6823 Epoch: 13 Global Step: 66900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:58:56,545-Speed 4698.18 samples/sec Loss 5.7031 Epoch: 13 Global Step: 66950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:59:07,784-Speed 4555.54 samples/sec Loss 5.7268 Epoch: 13 Global Step: 67000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:59:18,622-Speed 4724.40 samples/sec Loss 5.7642 Epoch: 13 Global Step: 67050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:59:29,390-Speed 4754.88 samples/sec Loss 5.7589 Epoch: 13 Global Step: 67100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:59:39,978-Speed 4836.10 samples/sec Loss 5.7352 Epoch: 13 Global Step: 67150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 01:59:50,471-Speed 4879.36 samples/sec Loss 5.7775 Epoch: 13 Global Step: 67200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:00:01,379-Speed 4694.09 samples/sec Loss 5.8260 Epoch: 13 Global Step: 67250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:00:12,456-Speed 4622.49 samples/sec Loss 5.8001 Epoch: 13 Global Step: 67300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:00:23,550-Speed 4615.23 samples/sec Loss 5.7851 Epoch: 13 Global Step: 67350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:00:34,091-Speed 4857.48 samples/sec Loss 5.8120 Epoch: 13 Global Step: 67400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:00:44,540-Speed 4899.93 samples/sec Loss 5.7978 Epoch: 13 Global Step: 67450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:00:55,166-Speed 4819.00 samples/sec Loss 5.7783 Epoch: 13 Global Step: 67500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:01:06,053-Speed 4702.94 samples/sec Loss 5.7451 Epoch: 13 Global Step: 67550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:01:16,543-Speed 4880.87 samples/sec Loss 5.7976 Epoch: 13 Global Step: 67600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:01:27,243-Speed 4785.44 samples/sec Loss 5.8482 Epoch: 13 Global Step: 67650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:01:37,927-Speed 4792.20 samples/sec Loss 5.8109 Epoch: 13 Global Step: 67700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:01:48,656-Speed 4772.60 samples/sec Loss 5.8582 Epoch: 13 Global Step: 67750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:01:59,420-Speed 4756.45 samples/sec Loss 5.7729 Epoch: 13 Global Step: 67800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:02:09,969-Speed 4853.73 samples/sec Loss 5.8270 Epoch: 13 Global Step: 67850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:02:20,381-Speed 4917.98 samples/sec Loss 5.7858 Epoch: 13 Global Step: 67900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:02:31,060-Speed 4794.55 samples/sec Loss 5.8221 Epoch: 13 Global Step: 67950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:02:41,595-Speed 4860.31 samples/sec Loss 5.8360 Epoch: 13 Global Step: 68000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:03:04,767-[lfw][68000]XNorm: 22.529664 Training: 2021-03-17 02:03:04,768-[lfw][68000]Accuracy-Flip: 0.99700+-0.00245 Training: 2021-03-17 02:03:04,768-[lfw][68000]Accuracy-Highest: 0.99700 Training: 2021-03-17 02:03:31,472-[cfp_fp][68000]XNorm: 19.634579 Training: 2021-03-17 02:03:31,473-[cfp_fp][68000]Accuracy-Flip: 0.97386+-0.00593 Training: 2021-03-17 02:03:31,473-[cfp_fp][68000]Accuracy-Highest: 0.97786 Training: 2021-03-17 02:03:54,679-[agedb_30][68000]XNorm: 21.943989 Training: 2021-03-17 02:03:54,679-[agedb_30][68000]Accuracy-Flip: 0.97650+-0.00693 Training: 2021-03-17 02:03:54,679-[agedb_30][68000]Accuracy-Highest: 0.97650 Training: 2021-03-17 02:04:05,159-Speed 612.70 samples/sec Loss 5.8113 Epoch: 13 Global Step: 68050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:04:15,630-Speed 4890.03 samples/sec Loss 5.7846 Epoch: 13 Global Step: 68100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:04:26,161-Speed 4862.16 samples/sec Loss 5.8064 Epoch: 13 Global Step: 68150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:04:36,820-Speed 4803.57 samples/sec Loss 5.8499 Epoch: 13 Global Step: 68200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:04:47,481-Speed 4802.60 samples/sec Loss 5.8066 Epoch: 13 Global Step: 68250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:04:58,108-Speed 4818.49 samples/sec Loss 5.7898 Epoch: 13 Global Step: 68300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:05:08,621-Speed 4870.11 samples/sec Loss 5.8368 Epoch: 13 Global Step: 68350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:05:19,065-Speed 4902.62 samples/sec Loss 5.7856 Epoch: 13 Global Step: 68400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:05:29,499-Speed 4907.46 samples/sec Loss 5.8814 Epoch: 13 Global Step: 68450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:05:39,924-Speed 4911.45 samples/sec Loss 5.8241 Epoch: 13 Global Step: 68500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:05:50,671-Speed 4764.36 samples/sec Loss 5.8256 Epoch: 13 Global Step: 68550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:06:01,412-Speed 4766.79 samples/sec Loss 5.8756 Epoch: 13 Global Step: 68600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:06:11,916-Speed 4874.66 samples/sec Loss 5.7892 Epoch: 13 Global Step: 68650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:06:22,658-Speed 4766.64 samples/sec Loss 5.8138 Epoch: 13 Global Step: 68700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:06:33,257-Speed 4830.47 samples/sec Loss 5.8091 Epoch: 13 Global Step: 68750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:06:43,745-Speed 4882.36 samples/sec Loss 5.8470 Epoch: 13 Global Step: 68800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:06:54,425-Speed 4793.96 samples/sec Loss 5.8441 Epoch: 13 Global Step: 68850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:07:04,972-Speed 4854.85 samples/sec Loss 5.8276 Epoch: 13 Global Step: 68900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:07:15,652-Speed 4794.14 samples/sec Loss 5.8454 Epoch: 13 Global Step: 68950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:07:26,098-Speed 4901.76 samples/sec Loss 5.8600 Epoch: 13 Global Step: 69000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:07:36,820-Speed 4775.20 samples/sec Loss 5.7739 Epoch: 13 Global Step: 69050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:07:47,573-Speed 4761.56 samples/sec Loss 5.8425 Epoch: 13 Global Step: 69100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:07:58,529-Speed 4673.46 samples/sec Loss 5.8642 Epoch: 13 Global Step: 69150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:08:09,207-Speed 4795.04 samples/sec Loss 5.8304 Epoch: 13 Global Step: 69200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:08:19,881-Speed 4797.14 samples/sec Loss 5.9055 Epoch: 13 Global Step: 69250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:08:30,582-Speed 4784.68 samples/sec Loss 5.8063 Epoch: 13 Global Step: 69300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:08:41,624-Speed 4637.05 samples/sec Loss 5.8940 Epoch: 13 Global Step: 69350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:08:52,267-Speed 4811.07 samples/sec Loss 5.8744 Epoch: 13 Global Step: 69400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:09:03,285-Speed 4647.07 samples/sec Loss 5.8116 Epoch: 13 Global Step: 69450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:09:14,198-Speed 4691.79 samples/sec Loss 5.8240 Epoch: 13 Global Step: 69500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:09:24,939-Speed 4766.92 samples/sec Loss 5.8968 Epoch: 13 Global Step: 69550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:09:36,086-Speed 4593.38 samples/sec Loss 5.8280 Epoch: 13 Global Step: 69600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:09:46,693-Speed 4827.17 samples/sec Loss 5.8447 Epoch: 13 Global Step: 69650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:09:57,328-Speed 4814.71 samples/sec Loss 5.8813 Epoch: 13 Global Step: 69700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:10:10,771-Speed 3808.84 samples/sec Loss 5.8132 Epoch: 14 Global Step: 69750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:10:21,769-Speed 4655.85 samples/sec Loss 5.0320 Epoch: 14 Global Step: 69800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:10:32,551-Speed 4749.09 samples/sec Loss 5.0869 Epoch: 14 Global Step: 69850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:10:43,436-Speed 4703.89 samples/sec Loss 5.0367 Epoch: 14 Global Step: 69900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:10:54,041-Speed 4828.45 samples/sec Loss 5.1137 Epoch: 14 Global Step: 69950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:11:04,715-Speed 4797.08 samples/sec Loss 5.1257 Epoch: 14 Global Step: 70000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:11:28,073-[lfw][70000]XNorm: 23.155374 Training: 2021-03-17 02:11:28,073-[lfw][70000]Accuracy-Flip: 0.99650+-0.00252 Training: 2021-03-17 02:11:28,073-[lfw][70000]Accuracy-Highest: 0.99700 Training: 2021-03-17 02:11:54,937-[cfp_fp][70000]XNorm: 20.151771 Training: 2021-03-17 02:11:54,938-[cfp_fp][70000]Accuracy-Flip: 0.97629+-0.00765 Training: 2021-03-17 02:11:54,938-[cfp_fp][70000]Accuracy-Highest: 0.97786 Training: 2021-03-17 02:12:18,038-[agedb_30][70000]XNorm: 22.541694 Training: 2021-03-17 02:12:18,039-[agedb_30][70000]Accuracy-Flip: 0.97633+-0.00777 Training: 2021-03-17 02:12:18,039-[agedb_30][70000]Accuracy-Highest: 0.97650 Training: 2021-03-17 02:12:28,635-Speed 610.11 samples/sec Loss 5.2791 Epoch: 14 Global Step: 70050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:12:39,396-Speed 4758.02 samples/sec Loss 5.1750 Epoch: 14 Global Step: 70100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:12:49,983-Speed 4836.32 samples/sec Loss 5.2639 Epoch: 14 Global Step: 70150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:13:00,650-Speed 4800.02 samples/sec Loss 5.3026 Epoch: 14 Global Step: 70200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:13:11,239-Speed 4835.21 samples/sec Loss 5.3419 Epoch: 14 Global Step: 70250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:13:21,978-Speed 4767.92 samples/sec Loss 5.3918 Epoch: 14 Global Step: 70300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:13:32,605-Speed 4818.08 samples/sec Loss 5.3889 Epoch: 14 Global Step: 70350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:13:43,070-Speed 4893.08 samples/sec Loss 5.4597 Epoch: 14 Global Step: 70400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:13:54,115-Speed 4635.69 samples/sec Loss 5.3534 Epoch: 14 Global Step: 70450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:14:04,873-Speed 4759.20 samples/sec Loss 5.4422 Epoch: 14 Global Step: 70500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:14:15,711-Speed 4724.61 samples/sec Loss 5.4386 Epoch: 14 Global Step: 70550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:14:26,791-Speed 4621.03 samples/sec Loss 5.4742 Epoch: 14 Global Step: 70600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:14:37,369-Speed 4840.27 samples/sec Loss 5.4746 Epoch: 14 Global Step: 70650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:14:48,003-Speed 4815.07 samples/sec Loss 5.5480 Epoch: 14 Global Step: 70700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:14:58,637-Speed 4814.76 samples/sec Loss 5.4880 Epoch: 14 Global Step: 70750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:15:09,149-Speed 4871.15 samples/sec Loss 5.5214 Epoch: 14 Global Step: 70800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:15:19,882-Speed 4770.57 samples/sec Loss 5.5766 Epoch: 14 Global Step: 70850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:15:30,648-Speed 4755.64 samples/sec Loss 5.5435 Epoch: 14 Global Step: 70900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:15:41,155-Speed 4873.48 samples/sec Loss 5.6097 Epoch: 14 Global Step: 70950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:15:51,830-Speed 4796.36 samples/sec Loss 5.5964 Epoch: 14 Global Step: 71000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:16:02,359-Speed 4862.85 samples/sec Loss 5.6379 Epoch: 14 Global Step: 71050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:16:13,245-Speed 4703.74 samples/sec Loss 5.6005 Epoch: 14 Global Step: 71100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:16:23,970-Speed 4774.13 samples/sec Loss 5.6297 Epoch: 14 Global Step: 71150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:16:34,771-Speed 4740.23 samples/sec Loss 5.6548 Epoch: 14 Global Step: 71200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:16:45,392-Speed 4820.96 samples/sec Loss 5.6509 Epoch: 14 Global Step: 71250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:16:56,121-Speed 4772.28 samples/sec Loss 5.6537 Epoch: 14 Global Step: 71300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:17:06,753-Speed 4815.97 samples/sec Loss 5.6994 Epoch: 14 Global Step: 71350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:17:17,530-Speed 4750.88 samples/sec Loss 5.6932 Epoch: 14 Global Step: 71400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:17:28,103-Speed 4842.96 samples/sec Loss 5.6554 Epoch: 14 Global Step: 71450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:17:38,780-Speed 4795.38 samples/sec Loss 5.6891 Epoch: 14 Global Step: 71500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:17:49,512-Speed 4770.90 samples/sec Loss 5.6545 Epoch: 14 Global Step: 71550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:18:00,129-Speed 4822.76 samples/sec Loss 5.6613 Epoch: 14 Global Step: 71600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:18:11,322-Speed 4574.46 samples/sec Loss 5.6304 Epoch: 14 Global Step: 71650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:18:22,166-Speed 4721.91 samples/sec Loss 5.6794 Epoch: 14 Global Step: 71700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:18:33,344-Speed 4580.55 samples/sec Loss 5.7678 Epoch: 14 Global Step: 71750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:18:44,104-Speed 4758.28 samples/sec Loss 5.6653 Epoch: 14 Global Step: 71800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:18:54,925-Speed 4731.73 samples/sec Loss 5.6874 Epoch: 14 Global Step: 71850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:19:05,723-Speed 4741.84 samples/sec Loss 5.7496 Epoch: 14 Global Step: 71900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:19:16,268-Speed 4855.86 samples/sec Loss 5.7621 Epoch: 14 Global Step: 71950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:19:26,871-Speed 4829.01 samples/sec Loss 5.6599 Epoch: 14 Global Step: 72000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:19:49,863-[lfw][72000]XNorm: 22.447765 Training: 2021-03-17 02:19:49,863-[lfw][72000]Accuracy-Flip: 0.99683+-0.00252 Training: 2021-03-17 02:19:49,863-[lfw][72000]Accuracy-Highest: 0.99700 Training: 2021-03-17 02:20:16,748-[cfp_fp][72000]XNorm: 19.567505 Training: 2021-03-17 02:20:16,748-[cfp_fp][72000]Accuracy-Flip: 0.97714+-0.00639 Training: 2021-03-17 02:20:16,748-[cfp_fp][72000]Accuracy-Highest: 0.97786 Training: 2021-03-17 02:20:39,865-[agedb_30][72000]XNorm: 21.815916 Training: 2021-03-17 02:20:39,865-[agedb_30][72000]Accuracy-Flip: 0.97433+-0.00638 Training: 2021-03-17 02:20:39,865-[agedb_30][72000]Accuracy-Highest: 0.97650 Training: 2021-03-17 02:20:50,372-Speed 613.17 samples/sec Loss 5.7435 Epoch: 14 Global Step: 72050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:21:01,028-Speed 4805.04 samples/sec Loss 5.7323 Epoch: 14 Global Step: 72100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:21:11,610-Speed 4838.25 samples/sec Loss 5.6991 Epoch: 14 Global Step: 72150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:21:22,188-Speed 4840.82 samples/sec Loss 5.7464 Epoch: 14 Global Step: 72200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:21:32,680-Speed 4879.96 samples/sec Loss 5.7527 Epoch: 14 Global Step: 72250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:21:43,392-Speed 4779.94 samples/sec Loss 5.7790 Epoch: 14 Global Step: 72300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:21:54,209-Speed 4733.44 samples/sec Loss 5.7410 Epoch: 14 Global Step: 72350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:22:04,983-Speed 4752.22 samples/sec Loss 5.7790 Epoch: 14 Global Step: 72400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:22:15,560-Speed 4841.02 samples/sec Loss 5.7314 Epoch: 14 Global Step: 72450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:22:26,028-Speed 4891.39 samples/sec Loss 5.7978 Epoch: 14 Global Step: 72500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:22:36,578-Speed 4853.45 samples/sec Loss 5.7147 Epoch: 14 Global Step: 72550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:22:47,151-Speed 4842.57 samples/sec Loss 5.8133 Epoch: 14 Global Step: 72600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:22:57,628-Speed 4887.22 samples/sec Loss 5.8050 Epoch: 14 Global Step: 72650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:23:08,552-Speed 4686.89 samples/sec Loss 5.7881 Epoch: 14 Global Step: 72700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:23:19,249-Speed 4786.79 samples/sec Loss 5.7404 Epoch: 14 Global Step: 72750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:23:30,007-Speed 4759.61 samples/sec Loss 5.7765 Epoch: 14 Global Step: 72800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:23:40,629-Speed 4820.09 samples/sec Loss 5.8010 Epoch: 14 Global Step: 72850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:23:51,398-Speed 4754.77 samples/sec Loss 5.7401 Epoch: 14 Global Step: 72900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:24:02,123-Speed 4773.95 samples/sec Loss 5.7754 Epoch: 14 Global Step: 72950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:24:12,797-Speed 4797.18 samples/sec Loss 5.8262 Epoch: 14 Global Step: 73000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:24:23,321-Speed 4865.44 samples/sec Loss 5.7763 Epoch: 14 Global Step: 73050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:24:33,791-Speed 4890.01 samples/sec Loss 5.7922 Epoch: 14 Global Step: 73100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:24:44,359-Speed 4845.13 samples/sec Loss 5.7605 Epoch: 14 Global Step: 73150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:24:55,216-Speed 4716.24 samples/sec Loss 5.7551 Epoch: 14 Global Step: 73200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:25:05,893-Speed 4795.26 samples/sec Loss 5.8158 Epoch: 14 Global Step: 73250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:25:16,547-Speed 4805.92 samples/sec Loss 5.8057 Epoch: 14 Global Step: 73300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:25:27,297-Speed 4763.27 samples/sec Loss 5.7625 Epoch: 14 Global Step: 73350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:25:37,966-Speed 4799.13 samples/sec Loss 5.8367 Epoch: 14 Global Step: 73400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:25:48,621-Speed 4805.25 samples/sec Loss 5.7785 Epoch: 14 Global Step: 73450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:25:59,285-Speed 4801.36 samples/sec Loss 5.7807 Epoch: 14 Global Step: 73500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:26:09,755-Speed 4890.48 samples/sec Loss 5.8379 Epoch: 14 Global Step: 73550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:26:20,830-Speed 4623.01 samples/sec Loss 5.8127 Epoch: 14 Global Step: 73600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:26:31,567-Speed 4768.89 samples/sec Loss 5.7902 Epoch: 14 Global Step: 73650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:26:42,651-Speed 4619.44 samples/sec Loss 5.7924 Epoch: 14 Global Step: 73700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2021-03-17 02:26:53,269-Speed 4822.43 samples/sec Loss 5.7838 Epoch: 14 Global Step: 73750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:27:04,166-Speed 4698.92 samples/sec Loss 5.7871 Epoch: 14 Global Step: 73800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:27:14,735-Speed 4844.53 samples/sec Loss 5.7899 Epoch: 14 Global Step: 73850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:27:25,819-Speed 4619.39 samples/sec Loss 5.8257 Epoch: 14 Global Step: 73900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:27:36,468-Speed 4808.02 samples/sec Loss 5.7668 Epoch: 14 Global Step: 73950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:27:47,238-Speed 4754.29 samples/sec Loss 5.7938 Epoch: 14 Global Step: 74000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:28:10,465-[lfw][74000]XNorm: 22.491556 Training: 2021-03-17 02:28:10,465-[lfw][74000]Accuracy-Flip: 0.99650+-0.00345 Training: 2021-03-17 02:28:10,465-[lfw][74000]Accuracy-Highest: 0.99700 Training: 2021-03-17 02:28:37,274-[cfp_fp][74000]XNorm: 19.615752 Training: 2021-03-17 02:28:37,274-[cfp_fp][74000]Accuracy-Flip: 0.97429+-0.00623 Training: 2021-03-17 02:28:37,274-[cfp_fp][74000]Accuracy-Highest: 0.97786 Training: 2021-03-17 02:29:00,474-[agedb_30][74000]XNorm: 21.909766 Training: 2021-03-17 02:29:00,475-[agedb_30][74000]Accuracy-Flip: 0.97283+-0.00719 Training: 2021-03-17 02:29:00,475-[agedb_30][74000]Accuracy-Highest: 0.97650 Training: 2021-03-17 02:29:11,216-Speed 609.69 samples/sec Loss 5.7907 Epoch: 14 Global Step: 74050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:29:21,715-Speed 4876.89 samples/sec Loss 5.7717 Epoch: 14 Global Step: 74100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:29:32,383-Speed 4799.57 samples/sec Loss 5.8183 Epoch: 14 Global Step: 74150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:29:42,871-Speed 4882.18 samples/sec Loss 5.7962 Epoch: 14 Global Step: 74200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:29:53,404-Speed 4860.82 samples/sec Loss 5.8128 Epoch: 14 Global Step: 74250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:30:03,862-Speed 4895.95 samples/sec Loss 5.7577 Epoch: 14 Global Step: 74300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:30:14,546-Speed 4792.45 samples/sec Loss 5.8611 Epoch: 14 Global Step: 74350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:30:25,126-Speed 4839.68 samples/sec Loss 5.7600 Epoch: 14 Global Step: 74400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:30:35,728-Speed 4829.59 samples/sec Loss 5.8346 Epoch: 14 Global Step: 74450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:30:46,351-Speed 4819.60 samples/sec Loss 5.7469 Epoch: 14 Global Step: 74500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:30:56,906-Speed 4851.36 samples/sec Loss 5.7570 Epoch: 14 Global Step: 74550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:31:07,765-Speed 4714.81 samples/sec Loss 5.7547 Epoch: 14 Global Step: 74600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:31:18,352-Speed 4836.47 samples/sec Loss 5.8169 Epoch: 14 Global Step: 74650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:31:29,024-Speed 4797.72 samples/sec Loss 5.7559 Epoch: 14 Global Step: 74700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:31:43,319-Speed 3581.91 samples/sec Loss 5.5094 Epoch: 15 Global Step: 74750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:31:54,326-Speed 4652.03 samples/sec Loss 4.9757 Epoch: 15 Global Step: 74800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:32:05,092-Speed 4756.00 samples/sec Loss 5.0022 Epoch: 15 Global Step: 74850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:32:15,754-Speed 4802.71 samples/sec Loss 5.0844 Epoch: 15 Global Step: 74900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:32:26,339-Speed 4837.19 samples/sec Loss 5.1237 Epoch: 15 Global Step: 74950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:32:36,870-Speed 4862.28 samples/sec Loss 5.1351 Epoch: 15 Global Step: 75000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:32:47,586-Speed 4778.22 samples/sec Loss 5.1398 Epoch: 15 Global Step: 75050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:32:58,128-Speed 4857.11 samples/sec Loss 5.2033 Epoch: 15 Global Step: 75100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:33:08,678-Speed 4853.10 samples/sec Loss 5.2535 Epoch: 15 Global Step: 75150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:33:19,155-Speed 4887.32 samples/sec Loss 5.2679 Epoch: 15 Global Step: 75200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:33:29,688-Speed 4861.07 samples/sec Loss 5.2454 Epoch: 15 Global Step: 75250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:33:40,222-Speed 4860.52 samples/sec Loss 5.3285 Epoch: 15 Global Step: 75300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:33:50,800-Speed 4840.18 samples/sec Loss 5.3705 Epoch: 15 Global Step: 75350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:34:01,594-Speed 4743.81 samples/sec Loss 5.3441 Epoch: 15 Global Step: 75400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:34:12,194-Speed 4830.52 samples/sec Loss 5.3515 Epoch: 15 Global Step: 75450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:34:23,082-Speed 4702.88 samples/sec Loss 5.4138 Epoch: 15 Global Step: 75500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:34:33,848-Speed 4755.53 samples/sec Loss 5.4129 Epoch: 15 Global Step: 75550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:34:44,522-Speed 4797.26 samples/sec Loss 5.4357 Epoch: 15 Global Step: 75600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:34:55,130-Speed 4826.52 samples/sec Loss 5.4937 Epoch: 15 Global Step: 75650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:35:05,981-Speed 4718.75 samples/sec Loss 5.5106 Epoch: 15 Global Step: 75700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:35:16,545-Speed 4847.05 samples/sec Loss 5.4828 Epoch: 15 Global Step: 75750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:35:27,135-Speed 4834.77 samples/sec Loss 5.5234 Epoch: 15 Global Step: 75800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:35:38,028-Speed 4700.27 samples/sec Loss 5.5145 Epoch: 15 Global Step: 75850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:35:48,991-Speed 4670.50 samples/sec Loss 5.5180 Epoch: 15 Global Step: 75900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:35:59,699-Speed 4781.53 samples/sec Loss 5.5696 Epoch: 15 Global Step: 75950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:36:10,322-Speed 4820.03 samples/sec Loss 5.5260 Epoch: 15 Global Step: 76000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:36:33,443-[lfw][76000]XNorm: 22.552039 Training: 2021-03-17 02:36:33,443-[lfw][76000]Accuracy-Flip: 0.99717+-0.00183 Training: 2021-03-17 02:36:33,443-[lfw][76000]Accuracy-Highest: 0.99717 Training: 2021-03-17 02:37:00,349-[cfp_fp][76000]XNorm: 19.462452 Training: 2021-03-17 02:37:00,349-[cfp_fp][76000]Accuracy-Flip: 0.97614+-0.00642 Training: 2021-03-17 02:37:00,349-[cfp_fp][76000]Accuracy-Highest: 0.97786 Training: 2021-03-17 02:37:23,844-[agedb_30][76000]XNorm: 22.096909 Training: 2021-03-17 02:37:23,844-[agedb_30][76000]Accuracy-Flip: 0.97633+-0.00614 Training: 2021-03-17 02:37:23,844-[agedb_30][76000]Accuracy-Highest: 0.97650 Training: 2021-03-17 02:37:34,573-Speed 607.71 samples/sec Loss 5.5401 Epoch: 15 Global Step: 76050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:37:45,375-Speed 4739.77 samples/sec Loss 5.5956 Epoch: 15 Global Step: 76100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:37:56,467-Speed 4616.18 samples/sec Loss 5.6314 Epoch: 15 Global Step: 76150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:38:06,762-Speed 4973.57 samples/sec Loss 5.5386 Epoch: 15 Global Step: 76200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:38:17,501-Speed 4767.81 samples/sec Loss 5.6081 Epoch: 15 Global Step: 76250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:38:28,170-Speed 4799.40 samples/sec Loss 5.6349 Epoch: 15 Global Step: 76300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:38:38,833-Speed 4801.55 samples/sec Loss 5.5869 Epoch: 15 Global Step: 76350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:38:49,345-Speed 4871.19 samples/sec Loss 5.5773 Epoch: 15 Global Step: 76400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:39:00,118-Speed 4752.69 samples/sec Loss 5.6335 Epoch: 15 Global Step: 76450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:39:10,613-Speed 4878.54 samples/sec Loss 5.6145 Epoch: 15 Global Step: 76500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:39:21,164-Speed 4853.00 samples/sec Loss 5.6300 Epoch: 15 Global Step: 76550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:39:31,878-Speed 4778.79 samples/sec Loss 5.6611 Epoch: 15 Global Step: 76600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:39:42,385-Speed 4873.13 samples/sec Loss 5.5871 Epoch: 15 Global Step: 76650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:39:53,071-Speed 4791.49 samples/sec Loss 5.6690 Epoch: 15 Global Step: 76700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:40:03,837-Speed 4756.04 samples/sec Loss 5.6940 Epoch: 15 Global Step: 76750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:40:14,879-Speed 4637.25 samples/sec Loss 5.7119 Epoch: 15 Global Step: 76800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:40:25,566-Speed 4791.03 samples/sec Loss 5.6277 Epoch: 15 Global Step: 76850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:40:36,224-Speed 4803.97 samples/sec Loss 5.7075 Epoch: 15 Global Step: 76900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:40:46,924-Speed 4784.99 samples/sec Loss 5.6751 Epoch: 15 Global Step: 76950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:40:57,540-Speed 4823.31 samples/sec Loss 5.6705 Epoch: 15 Global Step: 77000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:41:08,100-Speed 4848.57 samples/sec Loss 5.7159 Epoch: 15 Global Step: 77050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:41:18,644-Speed 4856.08 samples/sec Loss 5.6874 Epoch: 15 Global Step: 77100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:41:29,326-Speed 4793.40 samples/sec Loss 5.6992 Epoch: 15 Global Step: 77150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:41:39,991-Speed 4800.94 samples/sec Loss 5.6475 Epoch: 15 Global Step: 77200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:41:50,884-Speed 4700.53 samples/sec Loss 5.6553 Epoch: 15 Global Step: 77250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:42:01,521-Speed 4813.26 samples/sec Loss 5.6898 Epoch: 15 Global Step: 77300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:42:12,016-Speed 4878.73 samples/sec Loss 5.6970 Epoch: 15 Global Step: 77350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:42:22,580-Speed 4847.00 samples/sec Loss 5.7212 Epoch: 15 Global Step: 77400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:42:33,104-Speed 4865.40 samples/sec Loss 5.6938 Epoch: 15 Global Step: 77450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:42:43,743-Speed 4812.47 samples/sec Loss 5.7112 Epoch: 15 Global Step: 77500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:42:54,696-Speed 4674.76 samples/sec Loss 5.7155 Epoch: 15 Global Step: 77550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:43:05,361-Speed 4801.33 samples/sec Loss 5.7267 Epoch: 15 Global Step: 77600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:43:16,052-Speed 4789.01 samples/sec Loss 5.7103 Epoch: 15 Global Step: 77650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:43:26,541-Speed 4881.69 samples/sec Loss 5.6943 Epoch: 15 Global Step: 77700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:43:37,190-Speed 4807.98 samples/sec Loss 5.7551 Epoch: 15 Global Step: 77750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:43:47,805-Speed 4823.86 samples/sec Loss 5.7871 Epoch: 15 Global Step: 77800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:43:58,358-Speed 4851.67 samples/sec Loss 5.7289 Epoch: 15 Global Step: 77850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:44:08,872-Speed 4869.79 samples/sec Loss 5.7446 Epoch: 15 Global Step: 77900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:44:19,768-Speed 4699.36 samples/sec Loss 5.7608 Epoch: 15 Global Step: 77950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:44:30,589-Speed 4731.76 samples/sec Loss 5.7480 Epoch: 15 Global Step: 78000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:44:53,692-[lfw][78000]XNorm: 22.154883 Training: 2021-03-17 02:44:53,692-[lfw][78000]Accuracy-Flip: 0.99683+-0.00263 Training: 2021-03-17 02:44:53,692-[lfw][78000]Accuracy-Highest: 0.99717 Training: 2021-03-17 02:45:20,484-[cfp_fp][78000]XNorm: 19.309715 Training: 2021-03-17 02:45:20,485-[cfp_fp][78000]Accuracy-Flip: 0.97586+-0.00601 Training: 2021-03-17 02:45:20,485-[cfp_fp][78000]Accuracy-Highest: 0.97786 Training: 2021-03-17 02:45:43,585-[agedb_30][78000]XNorm: 21.610187 Training: 2021-03-17 02:45:43,585-[agedb_30][78000]Accuracy-Flip: 0.97533+-0.00488 Training: 2021-03-17 02:45:43,585-[agedb_30][78000]Accuracy-Highest: 0.97650 Training: 2021-03-17 02:45:53,915-Speed 614.45 samples/sec Loss 5.7474 Epoch: 15 Global Step: 78050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:46:04,619-Speed 4783.76 samples/sec Loss 5.7626 Epoch: 15 Global Step: 78100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:46:15,423-Speed 4739.20 samples/sec Loss 5.7742 Epoch: 15 Global Step: 78150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:46:26,027-Speed 4828.18 samples/sec Loss 5.8045 Epoch: 15 Global Step: 78200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:46:36,884-Speed 4716.33 samples/sec Loss 5.7159 Epoch: 15 Global Step: 78250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:46:47,743-Speed 4715.20 samples/sec Loss 5.7520 Epoch: 15 Global Step: 78300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:46:58,307-Speed 4846.63 samples/sec Loss 5.7609 Epoch: 15 Global Step: 78350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:47:09,104-Speed 4742.40 samples/sec Loss 5.7291 Epoch: 15 Global Step: 78400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:47:19,754-Speed 4807.57 samples/sec Loss 5.7470 Epoch: 15 Global Step: 78450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:47:30,264-Speed 4871.72 samples/sec Loss 5.7492 Epoch: 15 Global Step: 78500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:47:41,075-Speed 4736.36 samples/sec Loss 5.7549 Epoch: 15 Global Step: 78550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:47:51,808-Speed 4770.29 samples/sec Loss 5.7730 Epoch: 15 Global Step: 78600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:48:02,577-Speed 4754.69 samples/sec Loss 5.7662 Epoch: 15 Global Step: 78650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:48:13,200-Speed 4819.73 samples/sec Loss 5.7651 Epoch: 15 Global Step: 78700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:48:23,658-Speed 4896.01 samples/sec Loss 5.7481 Epoch: 15 Global Step: 78750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:48:34,395-Speed 4768.74 samples/sec Loss 5.7079 Epoch: 15 Global Step: 78800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:48:44,957-Speed 4847.87 samples/sec Loss 5.7699 Epoch: 15 Global Step: 78850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:48:55,586-Speed 4817.26 samples/sec Loss 5.7946 Epoch: 15 Global Step: 78900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:49:06,114-Speed 4863.32 samples/sec Loss 5.7256 Epoch: 15 Global Step: 78950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:49:16,746-Speed 4815.98 samples/sec Loss 5.7577 Epoch: 15 Global Step: 79000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:49:27,705-Speed 4672.13 samples/sec Loss 5.7807 Epoch: 15 Global Step: 79050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:49:38,216-Speed 4871.19 samples/sec Loss 5.7295 Epoch: 15 Global Step: 79100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:49:48,854-Speed 4812.98 samples/sec Loss 5.7585 Epoch: 15 Global Step: 79150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:49:59,518-Speed 4801.80 samples/sec Loss 5.7148 Epoch: 15 Global Step: 79200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:50:10,029-Speed 4871.42 samples/sec Loss 5.7503 Epoch: 15 Global Step: 79250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:50:20,813-Speed 4747.73 samples/sec Loss 5.7564 Epoch: 15 Global Step: 79300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:50:31,454-Speed 4811.95 samples/sec Loss 5.8077 Epoch: 15 Global Step: 79350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:50:41,922-Speed 4891.27 samples/sec Loss 5.7551 Epoch: 15 Global Step: 79400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:50:52,673-Speed 4762.54 samples/sec Loss 5.7550 Epoch: 15 Global Step: 79450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:51:03,659-Speed 4660.85 samples/sec Loss 5.7871 Epoch: 15 Global Step: 79500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:51:14,139-Speed 4885.36 samples/sec Loss 5.7672 Epoch: 15 Global Step: 79550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:51:24,657-Speed 4868.28 samples/sec Loss 5.7585 Epoch: 15 Global Step: 79600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:51:35,470-Speed 4735.25 samples/sec Loss 5.7901 Epoch: 15 Global Step: 79650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:51:46,295-Speed 4730.12 samples/sec Loss 5.6813 Epoch: 15 Global Step: 79700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:52:00,040-Speed 3725.12 samples/sec Loss 4.9532 Epoch: 16 Global Step: 79750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:52:10,563-Speed 4866.01 samples/sec Loss 4.3889 Epoch: 16 Global Step: 79800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:52:21,095-Speed 4861.50 samples/sec Loss 4.3447 Epoch: 16 Global Step: 79850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:52:31,668-Speed 4843.01 samples/sec Loss 4.1589 Epoch: 16 Global Step: 79900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:52:42,372-Speed 4783.69 samples/sec Loss 4.1712 Epoch: 16 Global Step: 79950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:52:52,982-Speed 4825.75 samples/sec Loss 4.1254 Epoch: 16 Global Step: 80000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:53:16,186-[lfw][80000]XNorm: 22.824345 Training: 2021-03-17 02:53:16,187-[lfw][80000]Accuracy-Flip: 0.99783+-0.00269 Training: 2021-03-17 02:53:16,187-[lfw][80000]Accuracy-Highest: 0.99783 Training: 2021-03-17 02:53:42,943-[cfp_fp][80000]XNorm: 20.035646 Training: 2021-03-17 02:53:42,943-[cfp_fp][80000]Accuracy-Flip: 0.98086+-0.00395 Training: 2021-03-17 02:53:42,943-[cfp_fp][80000]Accuracy-Highest: 0.98086 Training: 2021-03-17 02:54:06,104-[agedb_30][80000]XNorm: 22.319754 Training: 2021-03-17 02:54:06,105-[agedb_30][80000]Accuracy-Flip: 0.97733+-0.00490 Training: 2021-03-17 02:54:06,105-[agedb_30][80000]Accuracy-Highest: 0.97733 Training: 2021-03-17 02:54:16,835-Speed 610.59 samples/sec Loss 4.0988 Epoch: 16 Global Step: 80050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:54:27,489-Speed 4805.95 samples/sec Loss 4.0909 Epoch: 16 Global Step: 80100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:54:38,105-Speed 4823.09 samples/sec Loss 4.0584 Epoch: 16 Global Step: 80150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:54:48,621-Speed 4869.24 samples/sec Loss 4.0251 Epoch: 16 Global Step: 80200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:54:59,419-Speed 4741.75 samples/sec Loss 3.9517 Epoch: 16 Global Step: 80250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:55:09,950-Speed 4862.21 samples/sec Loss 3.9827 Epoch: 16 Global Step: 80300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:55:20,661-Speed 4780.30 samples/sec Loss 3.9771 Epoch: 16 Global Step: 80350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:55:31,387-Speed 4773.75 samples/sec Loss 3.8739 Epoch: 16 Global Step: 80400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:55:42,572-Speed 4577.77 samples/sec Loss 3.9360 Epoch: 16 Global Step: 80450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:55:53,474-Speed 4696.52 samples/sec Loss 3.8718 Epoch: 16 Global Step: 80500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:56:04,566-Speed 4615.93 samples/sec Loss 3.9011 Epoch: 16 Global Step: 80550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:56:15,232-Speed 4800.44 samples/sec Loss 3.8279 Epoch: 16 Global Step: 80600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:56:25,845-Speed 4824.70 samples/sec Loss 3.8613 Epoch: 16 Global Step: 80650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:56:36,421-Speed 4841.39 samples/sec Loss 3.8250 Epoch: 16 Global Step: 80700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:56:46,989-Speed 4844.89 samples/sec Loss 3.7567 Epoch: 16 Global Step: 80750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:56:57,755-Speed 4755.86 samples/sec Loss 3.7661 Epoch: 16 Global Step: 80800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:57:08,337-Speed 4838.70 samples/sec Loss 3.8047 Epoch: 16 Global Step: 80850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:57:18,965-Speed 4817.56 samples/sec Loss 3.7340 Epoch: 16 Global Step: 80900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:57:29,711-Speed 4764.68 samples/sec Loss 3.7386 Epoch: 16 Global Step: 80950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:57:40,188-Speed 4887.32 samples/sec Loss 3.7058 Epoch: 16 Global Step: 81000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:57:50,852-Speed 4801.22 samples/sec Loss 3.7136 Epoch: 16 Global Step: 81050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:58:01,424-Speed 4843.47 samples/sec Loss 3.7221 Epoch: 16 Global Step: 81100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:58:12,246-Speed 4731.19 samples/sec Loss 3.7092 Epoch: 16 Global Step: 81150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:58:22,891-Speed 4809.85 samples/sec Loss 3.6895 Epoch: 16 Global Step: 81200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:58:33,466-Speed 4841.61 samples/sec Loss 3.7406 Epoch: 16 Global Step: 81250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:58:44,135-Speed 4799.11 samples/sec Loss 3.6507 Epoch: 16 Global Step: 81300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:58:54,665-Speed 4862.66 samples/sec Loss 3.6056 Epoch: 16 Global Step: 81350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:59:05,420-Speed 4760.73 samples/sec Loss 3.6388 Epoch: 16 Global Step: 81400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:59:16,125-Speed 4783.29 samples/sec Loss 3.6767 Epoch: 16 Global Step: 81450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:59:26,679-Speed 4851.11 samples/sec Loss 3.6239 Epoch: 16 Global Step: 81500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:59:37,288-Speed 4826.49 samples/sec Loss 3.6193 Epoch: 16 Global Step: 81550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:59:47,924-Speed 4814.14 samples/sec Loss 3.6706 Epoch: 16 Global Step: 81600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 02:59:58,482-Speed 4849.35 samples/sec Loss 3.6314 Epoch: 16 Global Step: 81650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:00:09,504-Speed 4645.72 samples/sec Loss 3.6111 Epoch: 16 Global Step: 81700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:00:20,261-Speed 4759.97 samples/sec Loss 3.6255 Epoch: 16 Global Step: 81750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:00:30,797-Speed 4859.54 samples/sec Loss 3.6044 Epoch: 16 Global Step: 81800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:00:41,324-Speed 4863.92 samples/sec Loss 3.6370 Epoch: 16 Global Step: 81850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:00:51,996-Speed 4797.59 samples/sec Loss 3.5829 Epoch: 16 Global Step: 81900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:01:02,681-Speed 4792.17 samples/sec Loss 3.5586 Epoch: 16 Global Step: 81950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:01:13,193-Speed 4870.60 samples/sec Loss 3.5579 Epoch: 16 Global Step: 82000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:01:36,448-[lfw][82000]XNorm: 22.695173 Training: 2021-03-17 03:01:36,448-[lfw][82000]Accuracy-Flip: 0.99733+-0.00271 Training: 2021-03-17 03:01:36,448-[lfw][82000]Accuracy-Highest: 0.99783 Training: 2021-03-17 03:02:03,211-[cfp_fp][82000]XNorm: 20.046530 Training: 2021-03-17 03:02:03,211-[cfp_fp][82000]Accuracy-Flip: 0.98286+-0.00409 Training: 2021-03-17 03:02:03,212-[cfp_fp][82000]Accuracy-Highest: 0.98286 Training: 2021-03-17 03:02:26,320-[agedb_30][82000]XNorm: 22.291754 Training: 2021-03-17 03:02:26,320-[agedb_30][82000]Accuracy-Flip: 0.97700+-0.00618 Training: 2021-03-17 03:02:26,320-[agedb_30][82000]Accuracy-Highest: 0.97733 Training: 2021-03-17 03:02:36,967-Speed 611.17 samples/sec Loss 3.5733 Epoch: 16 Global Step: 82050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:02:47,511-Speed 4856.28 samples/sec Loss 3.5795 Epoch: 16 Global Step: 82100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:02:58,392-Speed 4705.22 samples/sec Loss 3.5197 Epoch: 16 Global Step: 82150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:03:09,080-Speed 4790.62 samples/sec Loss 3.5296 Epoch: 16 Global Step: 82200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:03:19,776-Speed 4787.32 samples/sec Loss 3.5725 Epoch: 16 Global Step: 82250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:03:30,208-Speed 4908.20 samples/sec Loss 3.5216 Epoch: 16 Global Step: 82300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:03:40,943-Speed 4769.35 samples/sec Loss 3.5457 Epoch: 16 Global Step: 82350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:03:51,779-Speed 4725.25 samples/sec Loss 3.5342 Epoch: 16 Global Step: 82400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:04:02,460-Speed 4793.99 samples/sec Loss 3.5370 Epoch: 16 Global Step: 82450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:04:13,082-Speed 4820.12 samples/sec Loss 3.4866 Epoch: 16 Global Step: 82500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:04:23,532-Speed 4899.75 samples/sec Loss 3.5023 Epoch: 16 Global Step: 82550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:04:34,573-Speed 4637.54 samples/sec Loss 3.4961 Epoch: 16 Global Step: 82600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:04:45,461-Speed 4702.78 samples/sec Loss 3.4574 Epoch: 16 Global Step: 82650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:04:56,286-Speed 4729.70 samples/sec Loss 3.4998 Epoch: 16 Global Step: 82700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:05:06,679-Speed 4926.73 samples/sec Loss 3.4423 Epoch: 16 Global Step: 82750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:05:17,615-Speed 4682.06 samples/sec Loss 3.4656 Epoch: 16 Global Step: 82800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:05:28,565-Speed 4676.11 samples/sec Loss 3.5048 Epoch: 16 Global Step: 82850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:05:39,172-Speed 4827.10 samples/sec Loss 3.4497 Epoch: 16 Global Step: 82900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:05:49,931-Speed 4758.97 samples/sec Loss 3.4219 Epoch: 16 Global Step: 82950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:06:00,615-Speed 4792.44 samples/sec Loss 3.3988 Epoch: 16 Global Step: 83000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:06:11,510-Speed 4699.45 samples/sec Loss 3.4882 Epoch: 16 Global Step: 83050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:06:22,088-Speed 4840.54 samples/sec Loss 3.4530 Epoch: 16 Global Step: 83100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:06:32,598-Speed 4872.04 samples/sec Loss 3.3961 Epoch: 16 Global Step: 83150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:06:43,348-Speed 4762.77 samples/sec Loss 3.3758 Epoch: 16 Global Step: 83200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:06:54,073-Speed 4774.04 samples/sec Loss 3.4836 Epoch: 16 Global Step: 83250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:07:04,912-Speed 4724.11 samples/sec Loss 3.3969 Epoch: 16 Global Step: 83300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:07:15,504-Speed 4833.92 samples/sec Loss 3.4085 Epoch: 16 Global Step: 83350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:07:26,013-Speed 4872.17 samples/sec Loss 3.4201 Epoch: 16 Global Step: 83400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:07:36,454-Speed 4903.89 samples/sec Loss 3.3780 Epoch: 16 Global Step: 83450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:07:47,038-Speed 4837.99 samples/sec Loss 3.3335 Epoch: 16 Global Step: 83500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:07:57,663-Speed 4819.02 samples/sec Loss 3.3292 Epoch: 16 Global Step: 83550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:08:08,167-Speed 4874.40 samples/sec Loss 3.3918 Epoch: 16 Global Step: 83600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:08:18,750-Speed 4837.97 samples/sec Loss 3.3756 Epoch: 16 Global Step: 83650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:08:29,378-Speed 4817.83 samples/sec Loss 3.3624 Epoch: 16 Global Step: 83700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:08:39,994-Speed 4823.37 samples/sec Loss 3.3506 Epoch: 16 Global Step: 83750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:08:51,063-Speed 4625.92 samples/sec Loss 3.3654 Epoch: 16 Global Step: 83800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:09:01,702-Speed 4812.69 samples/sec Loss 3.3587 Epoch: 16 Global Step: 83850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:09:12,497-Speed 4742.89 samples/sec Loss 3.3726 Epoch: 16 Global Step: 83900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:09:23,145-Speed 4808.57 samples/sec Loss 3.4175 Epoch: 16 Global Step: 83950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:09:34,003-Speed 4715.58 samples/sec Loss 3.3441 Epoch: 16 Global Step: 84000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:09:57,171-[lfw][84000]XNorm: 22.727265 Training: 2021-03-17 03:09:57,171-[lfw][84000]Accuracy-Flip: 0.99783+-0.00248 Training: 2021-03-17 03:09:57,173-[lfw][84000]Accuracy-Highest: 0.99783 Training: 2021-03-17 03:10:24,214-[cfp_fp][84000]XNorm: 20.192567 Training: 2021-03-17 03:10:24,214-[cfp_fp][84000]Accuracy-Flip: 0.98443+-0.00303 Training: 2021-03-17 03:10:24,214-[cfp_fp][84000]Accuracy-Highest: 0.98443 Training: 2021-03-17 03:10:47,244-[agedb_30][84000]XNorm: 22.314457 Training: 2021-03-17 03:10:47,244-[agedb_30][84000]Accuracy-Flip: 0.97683+-0.00713 Training: 2021-03-17 03:10:47,244-[agedb_30][84000]Accuracy-Highest: 0.97733 Training: 2021-03-17 03:10:57,718-Speed 611.60 samples/sec Loss 3.3542 Epoch: 16 Global Step: 84050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:11:08,187-Speed 4890.75 samples/sec Loss 3.3331 Epoch: 16 Global Step: 84100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:11:18,709-Speed 4866.21 samples/sec Loss 3.3852 Epoch: 16 Global Step: 84150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:11:29,461-Speed 4762.14 samples/sec Loss 3.3228 Epoch: 16 Global Step: 84200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:11:39,963-Speed 4875.47 samples/sec Loss 3.3237 Epoch: 16 Global Step: 84250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:11:50,703-Speed 4767.64 samples/sec Loss 3.3656 Epoch: 16 Global Step: 84300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:12:01,360-Speed 4804.59 samples/sec Loss 3.2914 Epoch: 16 Global Step: 84350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:12:12,106-Speed 4764.39 samples/sec Loss 3.3139 Epoch: 16 Global Step: 84400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:12:22,785-Speed 4794.90 samples/sec Loss 3.3243 Epoch: 16 Global Step: 84450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:12:33,776-Speed 4658.62 samples/sec Loss 3.3270 Epoch: 16 Global Step: 84500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:12:44,323-Speed 4854.77 samples/sec Loss 3.3256 Epoch: 16 Global Step: 84550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:12:54,772-Speed 4899.88 samples/sec Loss 3.2864 Epoch: 16 Global Step: 84600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:13:05,700-Speed 4685.61 samples/sec Loss 3.3277 Epoch: 16 Global Step: 84650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:13:19,498-Speed 3710.62 samples/sec Loss 3.2358 Epoch: 17 Global Step: 84700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:13:30,023-Speed 4865.15 samples/sec Loss 2.9239 Epoch: 17 Global Step: 84750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:13:40,636-Speed 4824.72 samples/sec Loss 2.8624 Epoch: 17 Global Step: 84800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:13:51,491-Speed 4716.92 samples/sec Loss 2.8832 Epoch: 17 Global Step: 84850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:14:02,592-Speed 4612.63 samples/sec Loss 2.8768 Epoch: 17 Global Step: 84900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:14:13,283-Speed 4789.12 samples/sec Loss 2.8848 Epoch: 17 Global Step: 84950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:14:24,303-Speed 4646.35 samples/sec Loss 2.9107 Epoch: 17 Global Step: 85000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:14:34,954-Speed 4807.31 samples/sec Loss 2.9002 Epoch: 17 Global Step: 85050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:14:45,626-Speed 4797.65 samples/sec Loss 2.9033 Epoch: 17 Global Step: 85100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:14:56,329-Speed 4784.05 samples/sec Loss 2.9157 Epoch: 17 Global Step: 85150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:15:06,856-Speed 4863.74 samples/sec Loss 2.9287 Epoch: 17 Global Step: 85200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:15:17,277-Speed 4913.52 samples/sec Loss 2.8927 Epoch: 17 Global Step: 85250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:15:28,347-Speed 4625.43 samples/sec Loss 2.8722 Epoch: 17 Global Step: 85300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:15:39,015-Speed 4799.43 samples/sec Loss 2.8963 Epoch: 17 Global Step: 85350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:15:49,845-Speed 4728.14 samples/sec Loss 2.9107 Epoch: 17 Global Step: 85400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:16:00,582-Speed 4768.56 samples/sec Loss 2.9036 Epoch: 17 Global Step: 85450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:16:11,110-Speed 4863.34 samples/sec Loss 2.9054 Epoch: 17 Global Step: 85500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:16:21,649-Speed 4858.47 samples/sec Loss 2.8904 Epoch: 17 Global Step: 85550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:16:32,311-Speed 4802.51 samples/sec Loss 2.8749 Epoch: 17 Global Step: 85600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:16:43,268-Speed 4672.90 samples/sec Loss 2.9064 Epoch: 17 Global Step: 85650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:16:53,931-Speed 4801.85 samples/sec Loss 2.8998 Epoch: 17 Global Step: 85700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:17:04,649-Speed 4777.08 samples/sec Loss 2.9477 Epoch: 17 Global Step: 85750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:17:15,237-Speed 4835.80 samples/sec Loss 2.9517 Epoch: 17 Global Step: 85800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:17:25,812-Speed 4841.82 samples/sec Loss 2.8808 Epoch: 17 Global Step: 85850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:17:36,800-Speed 4660.00 samples/sec Loss 2.8955 Epoch: 17 Global Step: 85900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:17:47,472-Speed 4797.99 samples/sec Loss 2.9242 Epoch: 17 Global Step: 85950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:17:58,147-Speed 4796.37 samples/sec Loss 2.9204 Epoch: 17 Global Step: 86000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:18:21,179-[lfw][86000]XNorm: 22.637417 Training: 2021-03-17 03:18:21,180-[lfw][86000]Accuracy-Flip: 0.99733+-0.00309 Training: 2021-03-17 03:18:21,180-[lfw][86000]Accuracy-Highest: 0.99783 Training: 2021-03-17 03:18:47,784-[cfp_fp][86000]XNorm: 20.113730 Training: 2021-03-17 03:18:47,784-[cfp_fp][86000]Accuracy-Flip: 0.98329+-0.00424 Training: 2021-03-17 03:18:47,784-[cfp_fp][86000]Accuracy-Highest: 0.98443 Training: 2021-03-17 03:19:10,735-[agedb_30][86000]XNorm: 22.265337 Training: 2021-03-17 03:19:10,735-[agedb_30][86000]Accuracy-Flip: 0.97733+-0.00638 Training: 2021-03-17 03:19:10,735-[agedb_30][86000]Accuracy-Highest: 0.97733 Training: 2021-03-17 03:19:21,311-Speed 615.66 samples/sec Loss 2.9068 Epoch: 17 Global Step: 86050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:19:31,807-Speed 4878.25 samples/sec Loss 2.8986 Epoch: 17 Global Step: 86100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:19:42,465-Speed 4803.80 samples/sec Loss 2.9220 Epoch: 17 Global Step: 86150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:19:52,985-Speed 4867.26 samples/sec Loss 2.8510 Epoch: 17 Global Step: 86200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:20:03,660-Speed 4796.23 samples/sec Loss 2.9184 Epoch: 17 Global Step: 86250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:20:14,349-Speed 4790.40 samples/sec Loss 2.9337 Epoch: 17 Global Step: 86300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:20:25,285-Speed 4682.19 samples/sec Loss 2.9271 Epoch: 17 Global Step: 86350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:20:35,959-Speed 4796.71 samples/sec Loss 2.9410 Epoch: 17 Global Step: 86400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:20:46,538-Speed 4840.12 samples/sec Loss 2.9400 Epoch: 17 Global Step: 86450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:20:57,218-Speed 4793.91 samples/sec Loss 2.8780 Epoch: 17 Global Step: 86500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:21:08,040-Speed 4731.26 samples/sec Loss 2.9253 Epoch: 17 Global Step: 86550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:21:18,555-Speed 4869.64 samples/sec Loss 2.8986 Epoch: 17 Global Step: 86600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:21:29,069-Speed 4870.03 samples/sec Loss 2.9716 Epoch: 17 Global Step: 86650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:21:39,723-Speed 4805.87 samples/sec Loss 2.9058 Epoch: 17 Global Step: 86700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:21:50,232-Speed 4872.13 samples/sec Loss 2.9203 Epoch: 17 Global Step: 86750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:22:00,768-Speed 4859.71 samples/sec Loss 2.9205 Epoch: 17 Global Step: 86800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:22:11,499-Speed 4771.48 samples/sec Loss 2.8909 Epoch: 17 Global Step: 86850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:22:22,140-Speed 4811.91 samples/sec Loss 2.9155 Epoch: 17 Global Step: 86900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:22:32,753-Speed 4824.45 samples/sec Loss 2.9387 Epoch: 17 Global Step: 86950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:22:43,360-Speed 4827.26 samples/sec Loss 2.9193 Epoch: 17 Global Step: 87000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:22:53,986-Speed 4818.57 samples/sec Loss 2.9163 Epoch: 17 Global Step: 87050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:23:05,236-Speed 4551.00 samples/sec Loss 2.9164 Epoch: 17 Global Step: 87100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:23:15,852-Speed 4823.48 samples/sec Loss 2.9072 Epoch: 17 Global Step: 87150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:23:26,828-Speed 4664.60 samples/sec Loss 2.9052 Epoch: 17 Global Step: 87200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:23:37,476-Speed 4808.73 samples/sec Loss 2.9134 Epoch: 17 Global Step: 87250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:23:48,233-Speed 4760.11 samples/sec Loss 2.8948 Epoch: 17 Global Step: 87300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:23:58,962-Speed 4771.95 samples/sec Loss 2.9134 Epoch: 17 Global Step: 87350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:24:09,563-Speed 4830.02 samples/sec Loss 2.9223 Epoch: 17 Global Step: 87400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:24:20,564-Speed 4654.21 samples/sec Loss 2.9197 Epoch: 17 Global Step: 87450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:24:31,450-Speed 4703.71 samples/sec Loss 2.9088 Epoch: 17 Global Step: 87500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:24:42,101-Speed 4807.24 samples/sec Loss 2.9205 Epoch: 17 Global Step: 87550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:24:52,769-Speed 4799.76 samples/sec Loss 2.9322 Epoch: 17 Global Step: 87600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:25:03,356-Speed 4836.34 samples/sec Loss 2.9369 Epoch: 17 Global Step: 87650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:25:13,986-Speed 4816.50 samples/sec Loss 2.9189 Epoch: 17 Global Step: 87700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:25:24,723-Speed 4768.84 samples/sec Loss 2.9329 Epoch: 17 Global Step: 87750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:25:35,403-Speed 4794.24 samples/sec Loss 2.9023 Epoch: 17 Global Step: 87800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:25:45,968-Speed 4846.36 samples/sec Loss 2.9273 Epoch: 17 Global Step: 87850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:25:56,500-Speed 4861.71 samples/sec Loss 2.9506 Epoch: 17 Global Step: 87900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:26:07,158-Speed 4803.76 samples/sec Loss 2.9003 Epoch: 17 Global Step: 87950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:26:17,824-Speed 4800.48 samples/sec Loss 2.9278 Epoch: 17 Global Step: 88000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:26:40,788-[lfw][88000]XNorm: 22.936456 Training: 2021-03-17 03:26:40,789-[lfw][88000]Accuracy-Flip: 0.99767+-0.00238 Training: 2021-03-17 03:26:40,789-[lfw][88000]Accuracy-Highest: 0.99783 Training: 2021-03-17 03:27:07,401-[cfp_fp][88000]XNorm: 20.369972 Training: 2021-03-17 03:27:07,401-[cfp_fp][88000]Accuracy-Flip: 0.98429+-0.00456 Training: 2021-03-17 03:27:07,401-[cfp_fp][88000]Accuracy-Highest: 0.98443 Training: 2021-03-17 03:27:30,344-[agedb_30][88000]XNorm: 22.614373 Training: 2021-03-17 03:27:30,344-[agedb_30][88000]Accuracy-Flip: 0.97733+-0.00544 Training: 2021-03-17 03:27:30,344-[agedb_30][88000]Accuracy-Highest: 0.97733 Training: 2021-03-17 03:27:41,112-Speed 614.74 samples/sec Loss 2.8430 Epoch: 17 Global Step: 88050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:27:51,791-Speed 4794.41 samples/sec Loss 2.9126 Epoch: 17 Global Step: 88100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:28:02,476-Speed 4792.22 samples/sec Loss 2.9250 Epoch: 17 Global Step: 88150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:28:13,368-Speed 4700.86 samples/sec Loss 2.9373 Epoch: 17 Global Step: 88200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:28:24,117-Speed 4763.23 samples/sec Loss 2.9128 Epoch: 17 Global Step: 88250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:28:34,831-Speed 4779.19 samples/sec Loss 2.9024 Epoch: 17 Global Step: 88300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:28:45,332-Speed 4875.86 samples/sec Loss 2.9203 Epoch: 17 Global Step: 88350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:28:56,012-Speed 4794.21 samples/sec Loss 2.9230 Epoch: 17 Global Step: 88400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2021-03-17 03:29:06,695-Speed 4792.79 samples/sec Loss 2.9183 Epoch: 17 Global Step: 88450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:29:17,608-Speed 4691.81 samples/sec Loss 2.8802 Epoch: 17 Global Step: 88500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:29:28,285-Speed 4795.63 samples/sec Loss 2.9403 Epoch: 17 Global Step: 88550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:29:39,131-Speed 4720.79 samples/sec Loss 2.8993 Epoch: 17 Global Step: 88600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:29:49,952-Speed 4731.84 samples/sec Loss 2.9047 Epoch: 17 Global Step: 88650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:30:00,733-Speed 4749.17 samples/sec Loss 2.8719 Epoch: 17 Global Step: 88700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:30:11,531-Speed 4741.79 samples/sec Loss 2.9065 Epoch: 17 Global Step: 88750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:30:22,166-Speed 4814.62 samples/sec Loss 2.8713 Epoch: 17 Global Step: 88800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:30:32,804-Speed 4812.87 samples/sec Loss 2.9643 Epoch: 17 Global Step: 88850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:30:43,412-Speed 4826.72 samples/sec Loss 2.9181 Epoch: 17 Global Step: 88900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:30:54,015-Speed 4829.12 samples/sec Loss 2.9250 Epoch: 17 Global Step: 88950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:31:04,591-Speed 4841.41 samples/sec Loss 2.9218 Epoch: 17 Global Step: 89000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:31:15,300-Speed 4781.36 samples/sec Loss 2.9158 Epoch: 17 Global Step: 89050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:31:26,154-Speed 4717.63 samples/sec Loss 2.9327 Epoch: 17 Global Step: 89100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:31:36,737-Speed 4837.81 samples/sec Loss 2.8585 Epoch: 17 Global Step: 89150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:31:47,256-Speed 4867.86 samples/sec Loss 2.9002 Epoch: 17 Global Step: 89200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:31:57,846-Speed 4834.83 samples/sec Loss 2.8655 Epoch: 17 Global Step: 89250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:32:09,018-Speed 4583.13 samples/sec Loss 2.8942 Epoch: 17 Global Step: 89300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:32:19,619-Speed 4830.00 samples/sec Loss 2.9173 Epoch: 17 Global Step: 89350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:32:30,619-Speed 4654.45 samples/sec Loss 2.8878 Epoch: 17 Global Step: 89400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:32:41,303-Speed 4792.57 samples/sec Loss 2.9026 Epoch: 17 Global Step: 89450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:32:51,800-Speed 4877.61 samples/sec Loss 2.8881 Epoch: 17 Global Step: 89500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:33:02,560-Speed 4758.60 samples/sec Loss 2.9260 Epoch: 17 Global Step: 89550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:33:13,607-Speed 4634.90 samples/sec Loss 2.8949 Epoch: 17 Global Step: 89600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:33:24,275-Speed 4799.82 samples/sec Loss 2.9088 Epoch: 17 Global Step: 89650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:33:38,210-Speed 3674.29 samples/sec Loss 2.7250 Epoch: 18 Global Step: 89700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:33:49,318-Speed 4609.62 samples/sec Loss 2.4803 Epoch: 18 Global Step: 89750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:34:00,106-Speed 4746.40 samples/sec Loss 2.5247 Epoch: 18 Global Step: 89800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:34:10,774-Speed 4799.90 samples/sec Loss 2.5169 Epoch: 18 Global Step: 89850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:34:21,477-Speed 4783.58 samples/sec Loss 2.5313 Epoch: 18 Global Step: 89900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:34:32,178-Speed 4785.08 samples/sec Loss 2.5590 Epoch: 18 Global Step: 89950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:34:42,753-Speed 4841.64 samples/sec Loss 2.5222 Epoch: 18 Global Step: 90000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:35:05,777-[lfw][90000]XNorm: 22.851073 Training: 2021-03-17 03:35:05,778-[lfw][90000]Accuracy-Flip: 0.99750+-0.00271 Training: 2021-03-17 03:35:05,778-[lfw][90000]Accuracy-Highest: 0.99783 Training: 2021-03-17 03:35:32,390-[cfp_fp][90000]XNorm: 20.396865 Training: 2021-03-17 03:35:32,391-[cfp_fp][90000]Accuracy-Flip: 0.98400+-0.00398 Training: 2021-03-17 03:35:32,391-[cfp_fp][90000]Accuracy-Highest: 0.98443 Training: 2021-03-17 03:35:55,346-[agedb_30][90000]XNorm: 22.618688 Training: 2021-03-17 03:35:55,346-[agedb_30][90000]Accuracy-Flip: 0.97750+-0.00579 Training: 2021-03-17 03:35:55,346-[agedb_30][90000]Accuracy-Highest: 0.97750 Training: 2021-03-17 03:36:05,761-Speed 616.81 samples/sec Loss 2.5646 Epoch: 18 Global Step: 90050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:36:16,311-Speed 4853.51 samples/sec Loss 2.5297 Epoch: 18 Global Step: 90100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:36:26,937-Speed 4818.28 samples/sec Loss 2.5395 Epoch: 18 Global Step: 90150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:36:37,991-Speed 4632.06 samples/sec Loss 2.5383 Epoch: 18 Global Step: 90200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:36:48,791-Speed 4741.17 samples/sec Loss 2.5591 Epoch: 18 Global Step: 90250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:36:59,542-Speed 4762.53 samples/sec Loss 2.5698 Epoch: 18 Global Step: 90300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:37:10,184-Speed 4811.44 samples/sec Loss 2.5639 Epoch: 18 Global Step: 90350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:37:20,706-Speed 4866.29 samples/sec Loss 2.5283 Epoch: 18 Global Step: 90400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:37:31,319-Speed 4824.29 samples/sec Loss 2.6470 Epoch: 18 Global Step: 90450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:37:42,022-Speed 4783.98 samples/sec Loss 2.5894 Epoch: 18 Global Step: 90500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:37:52,640-Speed 4822.45 samples/sec Loss 2.5714 Epoch: 18 Global Step: 90550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:38:03,462-Speed 4731.12 samples/sec Loss 2.5623 Epoch: 18 Global Step: 90600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:38:14,322-Speed 4714.52 samples/sec Loss 2.5669 Epoch: 18 Global Step: 90650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:38:24,911-Speed 4835.78 samples/sec Loss 2.5452 Epoch: 18 Global Step: 90700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:38:35,547-Speed 4813.75 samples/sec Loss 2.5711 Epoch: 18 Global Step: 90750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:38:46,191-Speed 4810.67 samples/sec Loss 2.6031 Epoch: 18 Global Step: 90800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:38:56,906-Speed 4778.25 samples/sec Loss 2.5733 Epoch: 18 Global Step: 90850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:39:07,604-Speed 4786.23 samples/sec Loss 2.6041 Epoch: 18 Global Step: 90900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:39:18,157-Speed 4852.11 samples/sec Loss 2.6165 Epoch: 18 Global Step: 90950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:39:28,814-Speed 4804.61 samples/sec Loss 2.6105 Epoch: 18 Global Step: 91000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:39:39,513-Speed 4785.39 samples/sec Loss 2.5980 Epoch: 18 Global Step: 91050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:39:50,137-Speed 4819.55 samples/sec Loss 2.6181 Epoch: 18 Global Step: 91100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:40:00,674-Speed 4859.41 samples/sec Loss 2.5940 Epoch: 18 Global Step: 91150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:40:11,333-Speed 4803.58 samples/sec Loss 2.6436 Epoch: 18 Global Step: 91200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:40:21,977-Speed 4810.46 samples/sec Loss 2.5847 Epoch: 18 Global Step: 91250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:40:32,446-Speed 4890.62 samples/sec Loss 2.6353 Epoch: 18 Global Step: 91300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:40:43,570-Speed 4602.72 samples/sec Loss 2.6617 Epoch: 18 Global Step: 91350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:40:53,964-Speed 4926.52 samples/sec Loss 2.5883 Epoch: 18 Global Step: 91400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:41:04,926-Speed 4670.89 samples/sec Loss 2.5715 Epoch: 18 Global Step: 91450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:41:16,008-Speed 4620.17 samples/sec Loss 2.6132 Epoch: 18 Global Step: 91500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:41:26,765-Speed 4759.95 samples/sec Loss 2.6206 Epoch: 18 Global Step: 91550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:41:37,547-Speed 4748.78 samples/sec Loss 2.6505 Epoch: 18 Global Step: 91600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:41:48,670-Speed 4603.12 samples/sec Loss 2.5865 Epoch: 18 Global Step: 91650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:41:59,290-Speed 4821.53 samples/sec Loss 2.6365 Epoch: 18 Global Step: 91700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:42:10,051-Speed 4757.94 samples/sec Loss 2.6363 Epoch: 18 Global Step: 91750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:42:20,645-Speed 4833.53 samples/sec Loss 2.6238 Epoch: 18 Global Step: 91800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:42:31,372-Speed 4773.20 samples/sec Loss 2.6307 Epoch: 18 Global Step: 91850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:42:41,984-Speed 4824.70 samples/sec Loss 2.6566 Epoch: 18 Global Step: 91900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:42:52,929-Speed 4678.26 samples/sec Loss 2.6569 Epoch: 18 Global Step: 91950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:43:03,581-Speed 4806.58 samples/sec Loss 2.6058 Epoch: 18 Global Step: 92000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:43:26,694-[lfw][92000]XNorm: 22.740636 Training: 2021-03-17 03:43:26,694-[lfw][92000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 03:43:26,694-[lfw][92000]Accuracy-Highest: 0.99783 Training: 2021-03-17 03:43:53,469-[cfp_fp][92000]XNorm: 20.356921 Training: 2021-03-17 03:43:53,469-[cfp_fp][92000]Accuracy-Flip: 0.98457+-0.00473 Training: 2021-03-17 03:43:53,469-[cfp_fp][92000]Accuracy-Highest: 0.98457 Training: 2021-03-17 03:44:16,423-[agedb_30][92000]XNorm: 22.460781 Training: 2021-03-17 03:44:16,424-[agedb_30][92000]Accuracy-Flip: 0.97800+-0.00645 Training: 2021-03-17 03:44:16,424-[agedb_30][92000]Accuracy-Highest: 0.97800 Training: 2021-03-17 03:44:26,951-Speed 614.14 samples/sec Loss 2.6229 Epoch: 18 Global Step: 92050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:44:37,430-Speed 4886.09 samples/sec Loss 2.6411 Epoch: 18 Global Step: 92100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:44:48,185-Speed 4760.72 samples/sec Loss 2.6342 Epoch: 18 Global Step: 92150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:44:58,793-Speed 4826.56 samples/sec Loss 2.6079 Epoch: 18 Global Step: 92200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:45:09,455-Speed 4802.55 samples/sec Loss 2.6565 Epoch: 18 Global Step: 92250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:45:20,311-Speed 4716.23 samples/sec Loss 2.6578 Epoch: 18 Global Step: 92300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:45:31,275-Speed 4670.32 samples/sec Loss 2.6429 Epoch: 18 Global Step: 92350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:45:41,941-Speed 4800.30 samples/sec Loss 2.6554 Epoch: 18 Global Step: 92400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:45:52,870-Speed 4685.22 samples/sec Loss 2.6905 Epoch: 18 Global Step: 92450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:46:03,454-Speed 4837.55 samples/sec Loss 2.6822 Epoch: 18 Global Step: 92500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:46:14,066-Speed 4824.90 samples/sec Loss 2.6384 Epoch: 18 Global Step: 92550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:46:24,572-Speed 4873.59 samples/sec Loss 2.6769 Epoch: 18 Global Step: 92600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:46:35,507-Speed 4682.17 samples/sec Loss 2.6800 Epoch: 18 Global Step: 92650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:46:46,110-Speed 4829.26 samples/sec Loss 2.6608 Epoch: 18 Global Step: 92700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:46:56,877-Speed 4755.63 samples/sec Loss 2.7097 Epoch: 18 Global Step: 92750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:47:07,578-Speed 4784.59 samples/sec Loss 2.6986 Epoch: 18 Global Step: 92800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:47:18,152-Speed 4842.35 samples/sec Loss 2.6634 Epoch: 18 Global Step: 92850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:47:28,957-Speed 4738.58 samples/sec Loss 2.6727 Epoch: 18 Global Step: 92900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:47:39,857-Speed 4697.49 samples/sec Loss 2.6286 Epoch: 18 Global Step: 92950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:47:50,576-Speed 4776.97 samples/sec Loss 2.6570 Epoch: 18 Global Step: 93000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:48:01,389-Speed 4735.02 samples/sec Loss 2.6543 Epoch: 18 Global Step: 93050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:48:12,155-Speed 4755.94 samples/sec Loss 2.7257 Epoch: 18 Global Step: 93100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:48:22,772-Speed 4822.83 samples/sec Loss 2.6556 Epoch: 18 Global Step: 93150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:48:33,468-Speed 4787.02 samples/sec Loss 2.7113 Epoch: 18 Global Step: 93200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:48:44,137-Speed 4798.82 samples/sec Loss 2.6809 Epoch: 18 Global Step: 93250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:48:54,780-Speed 4811.10 samples/sec Loss 2.6914 Epoch: 18 Global Step: 93300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:49:05,624-Speed 4721.67 samples/sec Loss 2.6851 Epoch: 18 Global Step: 93350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:49:16,193-Speed 4844.41 samples/sec Loss 2.7028 Epoch: 18 Global Step: 93400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:49:26,803-Speed 4825.90 samples/sec Loss 2.6788 Epoch: 18 Global Step: 93450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:49:37,724-Speed 4688.79 samples/sec Loss 2.6948 Epoch: 18 Global Step: 93500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:49:48,646-Speed 4687.62 samples/sec Loss 2.6874 Epoch: 18 Global Step: 93550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:49:59,256-Speed 4826.05 samples/sec Loss 2.6833 Epoch: 18 Global Step: 93600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:50:10,032-Speed 4751.56 samples/sec Loss 2.7135 Epoch: 18 Global Step: 93650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:50:20,822-Speed 4745.35 samples/sec Loss 2.7383 Epoch: 18 Global Step: 93700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:50:32,253-Speed 4479.24 samples/sec Loss 2.7036 Epoch: 18 Global Step: 93750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:50:42,924-Speed 4798.28 samples/sec Loss 2.6735 Epoch: 18 Global Step: 93800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:50:53,587-Speed 4801.88 samples/sec Loss 2.7465 Epoch: 18 Global Step: 93850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:51:04,870-Speed 4537.96 samples/sec Loss 2.7217 Epoch: 18 Global Step: 93900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:51:15,596-Speed 4773.43 samples/sec Loss 2.6768 Epoch: 18 Global Step: 93950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:51:26,275-Speed 4794.67 samples/sec Loss 2.7166 Epoch: 18 Global Step: 94000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:51:49,330-[lfw][94000]XNorm: 22.561804 Training: 2021-03-17 03:51:49,331-[lfw][94000]Accuracy-Flip: 0.99800+-0.00256 Training: 2021-03-17 03:51:49,331-[lfw][94000]Accuracy-Highest: 0.99800 Training: 2021-03-17 03:52:16,025-[cfp_fp][94000]XNorm: 20.103371 Training: 2021-03-17 03:52:16,025-[cfp_fp][94000]Accuracy-Flip: 0.98300+-0.00431 Training: 2021-03-17 03:52:16,025-[cfp_fp][94000]Accuracy-Highest: 0.98457 Training: 2021-03-17 03:52:39,167-[agedb_30][94000]XNorm: 22.219690 Training: 2021-03-17 03:52:39,168-[agedb_30][94000]Accuracy-Flip: 0.97950+-0.00619 Training: 2021-03-17 03:52:39,168-[agedb_30][94000]Accuracy-Highest: 0.97950 Training: 2021-03-17 03:52:49,730-Speed 613.51 samples/sec Loss 2.7085 Epoch: 18 Global Step: 94050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:53:00,334-Speed 4828.38 samples/sec Loss 2.6901 Epoch: 18 Global Step: 94100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:53:10,846-Speed 4870.70 samples/sec Loss 2.7422 Epoch: 18 Global Step: 94150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:53:21,609-Speed 4757.42 samples/sec Loss 2.6853 Epoch: 18 Global Step: 94200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:53:32,195-Speed 4836.68 samples/sec Loss 2.7193 Epoch: 18 Global Step: 94250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:53:42,730-Speed 4860.38 samples/sec Loss 2.7310 Epoch: 18 Global Step: 94300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:53:53,287-Speed 4850.15 samples/sec Loss 2.6841 Epoch: 18 Global Step: 94350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:54:04,096-Speed 4736.99 samples/sec Loss 2.6619 Epoch: 18 Global Step: 94400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:54:14,702-Speed 4827.84 samples/sec Loss 2.7332 Epoch: 18 Global Step: 94450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:54:25,440-Speed 4768.37 samples/sec Loss 2.7460 Epoch: 18 Global Step: 94500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:54:35,933-Speed 4879.72 samples/sec Loss 2.7302 Epoch: 18 Global Step: 94550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:54:46,579-Speed 4809.46 samples/sec Loss 2.7017 Epoch: 18 Global Step: 94600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:54:57,105-Speed 4864.24 samples/sec Loss 2.7334 Epoch: 18 Global Step: 94650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:55:10,828-Speed 3731.32 samples/sec Loss 2.3717 Epoch: 19 Global Step: 94700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:55:21,991-Speed 4586.88 samples/sec Loss 2.3914 Epoch: 19 Global Step: 94750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:55:32,418-Speed 4910.58 samples/sec Loss 2.3217 Epoch: 19 Global Step: 94800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:55:43,063-Speed 4810.10 samples/sec Loss 2.3339 Epoch: 19 Global Step: 94850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:55:53,764-Speed 4784.77 samples/sec Loss 2.3640 Epoch: 19 Global Step: 94900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:56:04,292-Speed 4863.51 samples/sec Loss 2.3574 Epoch: 19 Global Step: 94950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:56:14,984-Speed 4788.98 samples/sec Loss 2.3316 Epoch: 19 Global Step: 95000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:56:25,580-Speed 4832.08 samples/sec Loss 2.3819 Epoch: 19 Global Step: 95050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:56:36,321-Speed 4767.19 samples/sec Loss 2.3530 Epoch: 19 Global Step: 95100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:56:46,969-Speed 4808.33 samples/sec Loss 2.3467 Epoch: 19 Global Step: 95150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:56:57,793-Speed 4730.74 samples/sec Loss 2.3505 Epoch: 19 Global Step: 95200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:57:08,510-Speed 4777.42 samples/sec Loss 2.4173 Epoch: 19 Global Step: 95250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:57:19,037-Speed 4864.21 samples/sec Loss 2.3921 Epoch: 19 Global Step: 95300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:57:29,773-Speed 4769.05 samples/sec Loss 2.3496 Epoch: 19 Global Step: 95350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:57:40,261-Speed 4882.26 samples/sec Loss 2.3781 Epoch: 19 Global Step: 95400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:57:50,857-Speed 4832.18 samples/sec Loss 2.4062 Epoch: 19 Global Step: 95450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:58:01,604-Speed 4764.07 samples/sec Loss 2.4277 Epoch: 19 Global Step: 95500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:58:12,287-Speed 4792.96 samples/sec Loss 2.3736 Epoch: 19 Global Step: 95550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:58:22,883-Speed 4832.25 samples/sec Loss 2.3849 Epoch: 19 Global Step: 95600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:58:33,628-Speed 4765.35 samples/sec Loss 2.4068 Epoch: 19 Global Step: 95650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:58:44,425-Speed 4742.40 samples/sec Loss 2.3885 Epoch: 19 Global Step: 95700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:58:55,026-Speed 4829.89 samples/sec Loss 2.4550 Epoch: 19 Global Step: 95750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:59:06,291-Speed 4544.95 samples/sec Loss 2.3642 Epoch: 19 Global Step: 95800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:59:16,841-Speed 4853.66 samples/sec Loss 2.4675 Epoch: 19 Global Step: 95850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:59:27,467-Speed 4818.26 samples/sec Loss 2.3956 Epoch: 19 Global Step: 95900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:59:38,319-Speed 4718.54 samples/sec Loss 2.4560 Epoch: 19 Global Step: 95950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 03:59:49,189-Speed 4710.34 samples/sec Loss 2.4062 Epoch: 19 Global Step: 96000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:00:12,177-[lfw][96000]XNorm: 22.626751 Training: 2021-03-17 04:00:12,177-[lfw][96000]Accuracy-Flip: 0.99783+-0.00248 Training: 2021-03-17 04:00:12,177-[lfw][96000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:00:38,823-[cfp_fp][96000]XNorm: 20.430128 Training: 2021-03-17 04:00:38,824-[cfp_fp][96000]Accuracy-Flip: 0.98471+-0.00470 Training: 2021-03-17 04:00:38,824-[cfp_fp][96000]Accuracy-Highest: 0.98471 Training: 2021-03-17 04:01:01,915-[agedb_30][96000]XNorm: 22.486176 Training: 2021-03-17 04:01:01,915-[agedb_30][96000]Accuracy-Flip: 0.97967+-0.00614 Training: 2021-03-17 04:01:01,915-[agedb_30][96000]Accuracy-Highest: 0.97967 Training: 2021-03-17 04:01:12,464-Speed 614.83 samples/sec Loss 2.4297 Epoch: 19 Global Step: 96050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:01:23,301-Speed 4724.72 samples/sec Loss 2.4438 Epoch: 19 Global Step: 96100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:01:34,040-Speed 4768.19 samples/sec Loss 2.4987 Epoch: 19 Global Step: 96150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:01:44,696-Speed 4804.91 samples/sec Loss 2.4681 Epoch: 19 Global Step: 96200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:01:55,233-Speed 4858.97 samples/sec Loss 2.4272 Epoch: 19 Global Step: 96250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:02:05,586-Speed 4945.99 samples/sec Loss 2.4515 Epoch: 19 Global Step: 96300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:02:16,234-Speed 4808.21 samples/sec Loss 2.5005 Epoch: 19 Global Step: 96350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:02:27,257-Speed 4645.08 samples/sec Loss 2.4190 Epoch: 19 Global Step: 96400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:02:38,050-Speed 4743.95 samples/sec Loss 2.4559 Epoch: 19 Global Step: 96450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:02:48,547-Speed 4877.82 samples/sec Loss 2.4950 Epoch: 19 Global Step: 96500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:02:59,306-Speed 4759.22 samples/sec Loss 2.4583 Epoch: 19 Global Step: 96550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:03:09,958-Speed 4807.02 samples/sec Loss 2.4558 Epoch: 19 Global Step: 96600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:03:20,608-Speed 4807.55 samples/sec Loss 2.4279 Epoch: 19 Global Step: 96650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:03:31,531-Speed 4687.67 samples/sec Loss 2.4605 Epoch: 19 Global Step: 96700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:03:42,216-Speed 4791.95 samples/sec Loss 2.5136 Epoch: 19 Global Step: 96750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:03:52,780-Speed 4846.77 samples/sec Loss 2.4911 Epoch: 19 Global Step: 96800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:04:03,428-Speed 4808.66 samples/sec Loss 2.4537 Epoch: 19 Global Step: 96850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:04:14,388-Speed 4671.52 samples/sec Loss 2.5078 Epoch: 19 Global Step: 96900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:04:25,150-Speed 4757.92 samples/sec Loss 2.5014 Epoch: 19 Global Step: 96950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:04:35,975-Speed 4729.89 samples/sec Loss 2.4914 Epoch: 19 Global Step: 97000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:04:46,933-Speed 4672.59 samples/sec Loss 2.4926 Epoch: 19 Global Step: 97050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:04:57,666-Speed 4770.45 samples/sec Loss 2.5070 Epoch: 19 Global Step: 97100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:05:08,206-Speed 4858.03 samples/sec Loss 2.5275 Epoch: 19 Global Step: 97150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:05:18,898-Speed 4788.61 samples/sec Loss 2.5088 Epoch: 19 Global Step: 97200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:05:29,480-Speed 4838.83 samples/sec Loss 2.5088 Epoch: 19 Global Step: 97250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:05:40,048-Speed 4845.00 samples/sec Loss 2.5026 Epoch: 19 Global Step: 97300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:05:50,731-Speed 4792.74 samples/sec Loss 2.5114 Epoch: 19 Global Step: 97350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:06:01,224-Speed 4879.60 samples/sec Loss 2.4940 Epoch: 19 Global Step: 97400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:06:11,839-Speed 4823.44 samples/sec Loss 2.5046 Epoch: 19 Global Step: 97450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:06:22,503-Speed 4801.58 samples/sec Loss 2.5476 Epoch: 19 Global Step: 97500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:06:33,102-Speed 4830.83 samples/sec Loss 2.5117 Epoch: 19 Global Step: 97550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:06:43,662-Speed 4848.66 samples/sec Loss 2.5187 Epoch: 19 Global Step: 97600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:06:54,404-Speed 4766.47 samples/sec Loss 2.5242 Epoch: 19 Global Step: 97650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:07:05,117-Speed 4779.38 samples/sec Loss 2.5659 Epoch: 19 Global Step: 97700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:07:15,736-Speed 4821.94 samples/sec Loss 2.5344 Epoch: 19 Global Step: 97750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:07:26,308-Speed 4843.31 samples/sec Loss 2.5178 Epoch: 19 Global Step: 97800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:07:36,930-Speed 4820.23 samples/sec Loss 2.5453 Epoch: 19 Global Step: 97850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:07:47,412-Speed 4884.86 samples/sec Loss 2.5304 Epoch: 19 Global Step: 97900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:07:58,240-Speed 4728.61 samples/sec Loss 2.5716 Epoch: 19 Global Step: 97950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:08:08,731-Speed 4880.69 samples/sec Loss 2.5571 Epoch: 19 Global Step: 98000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:08:31,784-[lfw][98000]XNorm: 22.529079 Training: 2021-03-17 04:08:31,784-[lfw][98000]Accuracy-Flip: 0.99750+-0.00291 Training: 2021-03-17 04:08:31,784-[lfw][98000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:08:58,319-[cfp_fp][98000]XNorm: 20.264704 Training: 2021-03-17 04:08:58,319-[cfp_fp][98000]Accuracy-Flip: 0.98643+-0.00357 Training: 2021-03-17 04:08:58,319-[cfp_fp][98000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:09:21,284-[agedb_30][98000]XNorm: 22.342156 Training: 2021-03-17 04:09:21,284-[agedb_30][98000]Accuracy-Flip: 0.97850+-0.00630 Training: 2021-03-17 04:09:21,284-[agedb_30][98000]Accuracy-Highest: 0.97967 Training: 2021-03-17 04:09:31,995-Speed 614.91 samples/sec Loss 2.5933 Epoch: 19 Global Step: 98050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:09:42,601-Speed 4827.75 samples/sec Loss 2.5648 Epoch: 19 Global Step: 98100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:09:53,280-Speed 4794.54 samples/sec Loss 2.5666 Epoch: 19 Global Step: 98150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:10:04,100-Speed 4732.49 samples/sec Loss 2.5445 Epoch: 19 Global Step: 98200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:10:14,998-Speed 4698.25 samples/sec Loss 2.5393 Epoch: 19 Global Step: 98250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:10:25,771-Speed 4752.76 samples/sec Loss 2.5429 Epoch: 19 Global Step: 98300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:10:36,536-Speed 4756.23 samples/sec Loss 2.5659 Epoch: 19 Global Step: 98350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:10:47,167-Speed 4816.13 samples/sec Loss 2.5578 Epoch: 19 Global Step: 98400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:10:57,773-Speed 4827.73 samples/sec Loss 2.5717 Epoch: 19 Global Step: 98450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:11:08,576-Speed 4739.79 samples/sec Loss 2.5502 Epoch: 19 Global Step: 98500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:11:19,038-Speed 4894.08 samples/sec Loss 2.5691 Epoch: 19 Global Step: 98550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:11:29,897-Speed 4714.95 samples/sec Loss 2.5540 Epoch: 19 Global Step: 98600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:11:41,218-Speed 4522.86 samples/sec Loss 2.5783 Epoch: 19 Global Step: 98650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:11:51,954-Speed 4769.35 samples/sec Loss 2.5910 Epoch: 19 Global Step: 98700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:12:02,507-Speed 4851.93 samples/sec Loss 2.5831 Epoch: 19 Global Step: 98750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:12:13,103-Speed 4832.14 samples/sec Loss 2.5925 Epoch: 19 Global Step: 98800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:12:23,744-Speed 4811.57 samples/sec Loss 2.5739 Epoch: 19 Global Step: 98850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:12:34,549-Speed 4739.04 samples/sec Loss 2.5854 Epoch: 19 Global Step: 98900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:12:45,157-Speed 4826.75 samples/sec Loss 2.5901 Epoch: 19 Global Step: 98950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:12:55,949-Speed 4744.13 samples/sec Loss 2.5988 Epoch: 19 Global Step: 99000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:13:06,716-Speed 4755.65 samples/sec Loss 2.5752 Epoch: 19 Global Step: 99050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:13:17,368-Speed 4806.85 samples/sec Loss 2.6090 Epoch: 19 Global Step: 99100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:13:27,860-Speed 4880.16 samples/sec Loss 2.5851 Epoch: 19 Global Step: 99150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:13:38,332-Speed 4889.61 samples/sec Loss 2.5868 Epoch: 19 Global Step: 99200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:13:49,002-Speed 4798.66 samples/sec Loss 2.6070 Epoch: 19 Global Step: 99250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:13:59,585-Speed 4838.22 samples/sec Loss 2.5725 Epoch: 19 Global Step: 99300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:14:10,214-Speed 4817.09 samples/sec Loss 2.6518 Epoch: 19 Global Step: 99350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:14:20,770-Speed 4850.56 samples/sec Loss 2.6235 Epoch: 19 Global Step: 99400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:14:31,299-Speed 4863.00 samples/sec Loss 2.6117 Epoch: 19 Global Step: 99450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:14:42,056-Speed 4759.94 samples/sec Loss 2.5806 Epoch: 19 Global Step: 99500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:14:52,881-Speed 4729.73 samples/sec Loss 2.6174 Epoch: 19 Global Step: 99550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:15:03,686-Speed 4738.96 samples/sec Loss 2.5819 Epoch: 19 Global Step: 99600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:15:17,461-Speed 3717.10 samples/sec Loss 2.4798 Epoch: 20 Global Step: 99650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:15:28,246-Speed 4747.44 samples/sec Loss 2.1842 Epoch: 20 Global Step: 99700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:15:39,015-Speed 4755.02 samples/sec Loss 2.2070 Epoch: 20 Global Step: 99750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:15:49,761-Speed 4764.66 samples/sec Loss 2.2056 Epoch: 20 Global Step: 99800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:16:00,473-Speed 4779.98 samples/sec Loss 2.2182 Epoch: 20 Global Step: 99850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:16:11,008-Speed 4860.09 samples/sec Loss 2.2148 Epoch: 20 Global Step: 99900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:16:21,688-Speed 4794.58 samples/sec Loss 2.2608 Epoch: 20 Global Step: 99950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:16:32,280-Speed 4833.99 samples/sec Loss 2.2589 Epoch: 20 Global Step: 100000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:16:55,379-[lfw][100000]XNorm: 22.779139 Training: 2021-03-17 04:16:55,379-[lfw][100000]Accuracy-Flip: 0.99750+-0.00318 Training: 2021-03-17 04:16:55,380-[lfw][100000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:17:22,076-[cfp_fp][100000]XNorm: 20.366591 Training: 2021-03-17 04:17:22,076-[cfp_fp][100000]Accuracy-Flip: 0.98543+-0.00418 Training: 2021-03-17 04:17:22,076-[cfp_fp][100000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:17:45,288-[agedb_30][100000]XNorm: 22.482973 Training: 2021-03-17 04:17:45,288-[agedb_30][100000]Accuracy-Flip: 0.97950+-0.00543 Training: 2021-03-17 04:17:45,288-[agedb_30][100000]Accuracy-Highest: 0.97967 Training: 2021-03-17 04:17:55,886-Speed 612.40 samples/sec Loss 2.2131 Epoch: 20 Global Step: 100050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:18:06,490-Speed 4828.57 samples/sec Loss 2.2464 Epoch: 20 Global Step: 100100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:18:17,197-Speed 4781.91 samples/sec Loss 2.2420 Epoch: 20 Global Step: 100150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:18:28,091-Speed 4700.05 samples/sec Loss 2.2674 Epoch: 20 Global Step: 100200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:18:39,316-Speed 4561.40 samples/sec Loss 2.2471 Epoch: 20 Global Step: 100250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:18:49,946-Speed 4817.08 samples/sec Loss 2.2995 Epoch: 20 Global Step: 100300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:19:00,405-Speed 4895.39 samples/sec Loss 2.2919 Epoch: 20 Global Step: 100350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:19:11,215-Speed 4736.75 samples/sec Loss 2.2457 Epoch: 20 Global Step: 100400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:19:22,098-Speed 4704.68 samples/sec Loss 2.2634 Epoch: 20 Global Step: 100450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:19:32,983-Speed 4703.85 samples/sec Loss 2.3090 Epoch: 20 Global Step: 100500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:19:44,087-Speed 4611.01 samples/sec Loss 2.2818 Epoch: 20 Global Step: 100550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:19:54,784-Speed 4787.04 samples/sec Loss 2.2966 Epoch: 20 Global Step: 100600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:20:05,339-Speed 4850.72 samples/sec Loss 2.3021 Epoch: 20 Global Step: 100650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:20:16,197-Speed 4715.81 samples/sec Loss 2.2563 Epoch: 20 Global Step: 100700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:20:27,255-Speed 4630.03 samples/sec Loss 2.2882 Epoch: 20 Global Step: 100750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:20:38,107-Speed 4718.53 samples/sec Loss 2.3158 Epoch: 20 Global Step: 100800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:20:49,050-Speed 4678.88 samples/sec Loss 2.3557 Epoch: 20 Global Step: 100850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:20:59,552-Speed 4875.47 samples/sec Loss 2.3082 Epoch: 20 Global Step: 100900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:21:10,212-Speed 4803.13 samples/sec Loss 2.3443 Epoch: 20 Global Step: 100950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:21:20,921-Speed 4781.48 samples/sec Loss 2.2995 Epoch: 20 Global Step: 101000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:21:31,552-Speed 4816.02 samples/sec Loss 2.3384 Epoch: 20 Global Step: 101050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:21:42,183-Speed 4816.26 samples/sec Loss 2.3174 Epoch: 20 Global Step: 101100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:21:52,951-Speed 4755.41 samples/sec Loss 2.3426 Epoch: 20 Global Step: 101150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:22:03,370-Speed 4914.26 samples/sec Loss 2.3893 Epoch: 20 Global Step: 101200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:22:14,451-Speed 4620.69 samples/sec Loss 2.3531 Epoch: 20 Global Step: 101250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:22:25,070-Speed 4821.60 samples/sec Loss 2.3744 Epoch: 20 Global Step: 101300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:22:35,613-Speed 4856.70 samples/sec Loss 2.3539 Epoch: 20 Global Step: 101350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:22:46,083-Speed 4890.30 samples/sec Loss 2.3578 Epoch: 20 Global Step: 101400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:22:57,035-Speed 4674.91 samples/sec Loss 2.3486 Epoch: 20 Global Step: 101450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:23:07,813-Speed 4750.90 samples/sec Loss 2.3537 Epoch: 20 Global Step: 101500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:23:18,512-Speed 4785.69 samples/sec Loss 2.3708 Epoch: 20 Global Step: 101550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:23:29,070-Speed 4849.42 samples/sec Loss 2.4157 Epoch: 20 Global Step: 101600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:23:39,859-Speed 4745.82 samples/sec Loss 2.4058 Epoch: 20 Global Step: 101650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:23:50,455-Speed 4832.49 samples/sec Loss 2.3686 Epoch: 20 Global Step: 101700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:24:01,094-Speed 4812.67 samples/sec Loss 2.3732 Epoch: 20 Global Step: 101750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:24:11,979-Speed 4703.88 samples/sec Loss 2.4002 Epoch: 20 Global Step: 101800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:24:22,764-Speed 4747.65 samples/sec Loss 2.3868 Epoch: 20 Global Step: 101850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:24:33,444-Speed 4793.93 samples/sec Loss 2.3873 Epoch: 20 Global Step: 101900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:24:44,123-Speed 4794.96 samples/sec Loss 2.4354 Epoch: 20 Global Step: 101950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:24:54,822-Speed 4785.52 samples/sec Loss 2.4277 Epoch: 20 Global Step: 102000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:25:17,983-[lfw][102000]XNorm: 22.388305 Training: 2021-03-17 04:25:17,983-[lfw][102000]Accuracy-Flip: 0.99733+-0.00249 Training: 2021-03-17 04:25:17,983-[lfw][102000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:25:44,591-[cfp_fp][102000]XNorm: 20.087866 Training: 2021-03-17 04:25:44,591-[cfp_fp][102000]Accuracy-Flip: 0.98600+-0.00398 Training: 2021-03-17 04:25:44,591-[cfp_fp][102000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:26:07,573-[agedb_30][102000]XNorm: 22.109594 Training: 2021-03-17 04:26:07,573-[agedb_30][102000]Accuracy-Flip: 0.98017+-0.00608 Training: 2021-03-17 04:26:07,574-[agedb_30][102000]Accuracy-Highest: 0.98017 Training: 2021-03-17 04:26:18,161-Speed 614.36 samples/sec Loss 2.3887 Epoch: 20 Global Step: 102050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:26:28,773-Speed 4825.30 samples/sec Loss 2.4261 Epoch: 20 Global Step: 102100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:26:39,520-Speed 4764.16 samples/sec Loss 2.4490 Epoch: 20 Global Step: 102150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:26:50,041-Speed 4866.93 samples/sec Loss 2.4147 Epoch: 20 Global Step: 102200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:27:00,708-Speed 4799.76 samples/sec Loss 2.4306 Epoch: 20 Global Step: 102250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:27:11,435-Speed 4773.46 samples/sec Loss 2.4437 Epoch: 20 Global Step: 102300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:27:22,127-Speed 4788.48 samples/sec Loss 2.4083 Epoch: 20 Global Step: 102350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:27:32,786-Speed 4803.93 samples/sec Loss 2.4485 Epoch: 20 Global Step: 102400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:27:43,298-Speed 4870.59 samples/sec Loss 2.4642 Epoch: 20 Global Step: 102450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:27:53,950-Speed 4806.85 samples/sec Loss 2.4277 Epoch: 20 Global Step: 102500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:28:04,912-Speed 4670.93 samples/sec Loss 2.4576 Epoch: 20 Global Step: 102550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:28:15,462-Speed 4853.12 samples/sec Loss 2.4359 Epoch: 20 Global Step: 102600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:28:26,145-Speed 4792.89 samples/sec Loss 2.4243 Epoch: 20 Global Step: 102650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:28:36,946-Speed 4740.61 samples/sec Loss 2.4630 Epoch: 20 Global Step: 102700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:28:47,685-Speed 4767.85 samples/sec Loss 2.4213 Epoch: 20 Global Step: 102750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:28:58,752-Speed 4626.47 samples/sec Loss 2.4428 Epoch: 20 Global Step: 102800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:29:09,485-Speed 4770.59 samples/sec Loss 2.4787 Epoch: 20 Global Step: 102850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2021-03-17 04:29:20,150-Speed 4800.81 samples/sec Loss 2.4736 Epoch: 20 Global Step: 102900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:29:30,784-Speed 4815.16 samples/sec Loss 2.4724 Epoch: 20 Global Step: 102950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:29:41,314-Speed 4862.59 samples/sec Loss 2.4384 Epoch: 20 Global Step: 103000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:29:52,012-Speed 4786.16 samples/sec Loss 2.4681 Epoch: 20 Global Step: 103050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:30:02,580-Speed 4845.04 samples/sec Loss 2.4868 Epoch: 20 Global Step: 103100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:30:13,447-Speed 4711.72 samples/sec Loss 2.4594 Epoch: 20 Global Step: 103150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:30:24,079-Speed 4815.57 samples/sec Loss 2.4352 Epoch: 20 Global Step: 103200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:30:34,752-Speed 4797.42 samples/sec Loss 2.4912 Epoch: 20 Global Step: 103250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:30:45,742-Speed 4659.04 samples/sec Loss 2.5065 Epoch: 20 Global Step: 103300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:30:56,272-Speed 4862.33 samples/sec Loss 2.5255 Epoch: 20 Global Step: 103350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:31:06,704-Speed 4908.26 samples/sec Loss 2.4636 Epoch: 20 Global Step: 103400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:31:17,240-Speed 4859.64 samples/sec Loss 2.4820 Epoch: 20 Global Step: 103450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:31:27,928-Speed 4790.59 samples/sec Loss 2.4916 Epoch: 20 Global Step: 103500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:31:38,522-Speed 4833.03 samples/sec Loss 2.5172 Epoch: 20 Global Step: 103550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:31:49,223-Speed 4785.19 samples/sec Loss 2.4990 Epoch: 20 Global Step: 103600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:31:59,835-Speed 4824.59 samples/sec Loss 2.4775 Epoch: 20 Global Step: 103650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:32:10,455-Speed 4821.36 samples/sec Loss 2.5276 Epoch: 20 Global Step: 103700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:32:21,089-Speed 4815.11 samples/sec Loss 2.5123 Epoch: 20 Global Step: 103750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:32:31,757-Speed 4799.50 samples/sec Loss 2.5481 Epoch: 20 Global Step: 103800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:32:42,416-Speed 4803.62 samples/sec Loss 2.5313 Epoch: 20 Global Step: 103850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:32:53,144-Speed 4773.01 samples/sec Loss 2.5453 Epoch: 20 Global Step: 103900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:33:03,825-Speed 4793.76 samples/sec Loss 2.4953 Epoch: 20 Global Step: 103950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:33:14,563-Speed 4768.00 samples/sec Loss 2.5219 Epoch: 20 Global Step: 104000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:33:37,581-[lfw][104000]XNorm: 22.611018 Training: 2021-03-17 04:33:37,582-[lfw][104000]Accuracy-Flip: 0.99800+-0.00233 Training: 2021-03-17 04:33:37,582-[lfw][104000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:34:04,183-[cfp_fp][104000]XNorm: 20.350014 Training: 2021-03-17 04:34:04,184-[cfp_fp][104000]Accuracy-Flip: 0.98400+-0.00486 Training: 2021-03-17 04:34:04,184-[cfp_fp][104000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:34:27,199-[agedb_30][104000]XNorm: 22.293615 Training: 2021-03-17 04:34:27,199-[agedb_30][104000]Accuracy-Flip: 0.97950+-0.00553 Training: 2021-03-17 04:34:27,199-[agedb_30][104000]Accuracy-Highest: 0.98017 Training: 2021-03-17 04:34:37,775-Speed 615.30 samples/sec Loss 2.5154 Epoch: 20 Global Step: 104050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:34:48,449-Speed 4796.91 samples/sec Loss 2.5314 Epoch: 20 Global Step: 104100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:34:59,144-Speed 4787.32 samples/sec Loss 2.5038 Epoch: 20 Global Step: 104150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:35:09,774-Speed 4816.93 samples/sec Loss 2.5055 Epoch: 20 Global Step: 104200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:35:20,853-Speed 4621.50 samples/sec Loss 2.5085 Epoch: 20 Global Step: 104250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:35:31,418-Speed 4846.43 samples/sec Loss 2.5206 Epoch: 20 Global Step: 104300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:35:42,023-Speed 4828.21 samples/sec Loss 2.5605 Epoch: 20 Global Step: 104350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:35:52,520-Speed 4877.36 samples/sec Loss 2.5173 Epoch: 20 Global Step: 104400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:36:03,112-Speed 4834.11 samples/sec Loss 2.5674 Epoch: 20 Global Step: 104450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:36:13,729-Speed 4822.78 samples/sec Loss 2.5779 Epoch: 20 Global Step: 104500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:36:24,436-Speed 4782.05 samples/sec Loss 2.5193 Epoch: 20 Global Step: 104550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:36:35,055-Speed 4821.75 samples/sec Loss 2.5304 Epoch: 20 Global Step: 104600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:36:48,733-Speed 3743.48 samples/sec Loss 2.3086 Epoch: 21 Global Step: 104650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:36:59,313-Speed 4839.80 samples/sec Loss 2.0714 Epoch: 21 Global Step: 104700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:37:09,970-Speed 4804.47 samples/sec Loss 2.0461 Epoch: 21 Global Step: 104750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:37:20,728-Speed 4759.64 samples/sec Loss 2.0527 Epoch: 21 Global Step: 104800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:37:31,354-Speed 4818.47 samples/sec Loss 2.0380 Epoch: 21 Global Step: 104850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:37:42,543-Speed 4576.28 samples/sec Loss 2.0485 Epoch: 21 Global Step: 104900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:37:53,896-Speed 4510.13 samples/sec Loss 2.0803 Epoch: 21 Global Step: 104950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:38:05,002-Speed 4610.33 samples/sec Loss 2.0645 Epoch: 21 Global Step: 105000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:38:15,693-Speed 4789.29 samples/sec Loss 2.0458 Epoch: 21 Global Step: 105050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:38:26,453-Speed 4758.31 samples/sec Loss 2.0342 Epoch: 21 Global Step: 105100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:38:37,018-Speed 4846.73 samples/sec Loss 2.0602 Epoch: 21 Global Step: 105150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:38:47,728-Speed 4780.80 samples/sec Loss 2.0033 Epoch: 21 Global Step: 105200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:38:58,363-Speed 4814.26 samples/sec Loss 2.0334 Epoch: 21 Global Step: 105250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:39:09,172-Speed 4737.18 samples/sec Loss 2.0630 Epoch: 21 Global Step: 105300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:39:20,078-Speed 4694.72 samples/sec Loss 2.0261 Epoch: 21 Global Step: 105350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:39:31,070-Speed 4658.20 samples/sec Loss 2.0410 Epoch: 21 Global Step: 105400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:39:41,647-Speed 4841.06 samples/sec Loss 2.0321 Epoch: 21 Global Step: 105450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:39:52,415-Speed 4754.77 samples/sec Loss 2.0438 Epoch: 21 Global Step: 105500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:40:03,020-Speed 4828.29 samples/sec Loss 2.0391 Epoch: 21 Global Step: 105550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:40:13,676-Speed 4804.92 samples/sec Loss 2.0464 Epoch: 21 Global Step: 105600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:40:24,370-Speed 4787.96 samples/sec Loss 2.0512 Epoch: 21 Global Step: 105650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:40:34,962-Speed 4833.74 samples/sec Loss 2.0012 Epoch: 21 Global Step: 105700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:40:45,432-Speed 4890.66 samples/sec Loss 2.0287 Epoch: 21 Global Step: 105750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:40:55,888-Speed 4896.83 samples/sec Loss 2.0467 Epoch: 21 Global Step: 105800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:41:06,431-Speed 4856.32 samples/sec Loss 2.0013 Epoch: 21 Global Step: 105850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:41:17,061-Speed 4817.09 samples/sec Loss 2.0152 Epoch: 21 Global Step: 105900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:41:27,867-Speed 4737.95 samples/sec Loss 2.0549 Epoch: 21 Global Step: 105950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:41:38,466-Speed 4830.99 samples/sec Loss 2.0292 Epoch: 21 Global Step: 106000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:42:01,601-[lfw][106000]XNorm: 22.653726 Training: 2021-03-17 04:42:01,601-[lfw][106000]Accuracy-Flip: 0.99750+-0.00291 Training: 2021-03-17 04:42:01,601-[lfw][106000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:42:28,239-[cfp_fp][106000]XNorm: 20.444330 Training: 2021-03-17 04:42:28,240-[cfp_fp][106000]Accuracy-Flip: 0.98586+-0.00386 Training: 2021-03-17 04:42:28,240-[cfp_fp][106000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:42:51,383-[agedb_30][106000]XNorm: 22.365402 Training: 2021-03-17 04:42:51,383-[agedb_30][106000]Accuracy-Flip: 0.97917+-0.00569 Training: 2021-03-17 04:42:51,384-[agedb_30][106000]Accuracy-Highest: 0.98017 Training: 2021-03-17 04:43:02,004-Speed 612.90 samples/sec Loss 2.0626 Epoch: 21 Global Step: 106050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:43:12,588-Speed 4837.40 samples/sec Loss 2.0854 Epoch: 21 Global Step: 106100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:43:23,255-Speed 4799.99 samples/sec Loss 2.0138 Epoch: 21 Global Step: 106150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:43:34,007-Speed 4762.16 samples/sec Loss 2.0290 Epoch: 21 Global Step: 106200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:43:44,382-Speed 4935.34 samples/sec Loss 2.0400 Epoch: 21 Global Step: 106250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:43:55,047-Speed 4800.74 samples/sec Loss 2.0330 Epoch: 21 Global Step: 106300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:44:05,734-Speed 4791.25 samples/sec Loss 2.0358 Epoch: 21 Global Step: 106350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:44:16,670-Speed 4681.70 samples/sec Loss 2.0427 Epoch: 21 Global Step: 106400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:44:27,093-Speed 4912.64 samples/sec Loss 2.0311 Epoch: 21 Global Step: 106450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:44:37,788-Speed 4787.61 samples/sec Loss 2.0330 Epoch: 21 Global Step: 106500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:44:48,346-Speed 4849.36 samples/sec Loss 2.0117 Epoch: 21 Global Step: 106550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:44:58,922-Speed 4841.54 samples/sec Loss 2.0481 Epoch: 21 Global Step: 106600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:45:09,608-Speed 4791.50 samples/sec Loss 2.0329 Epoch: 21 Global Step: 106650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:45:20,278-Speed 4798.49 samples/sec Loss 2.0381 Epoch: 21 Global Step: 106700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:45:31,282-Speed 4653.10 samples/sec Loss 2.0433 Epoch: 21 Global Step: 106750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:45:42,042-Speed 4758.64 samples/sec Loss 2.0196 Epoch: 21 Global Step: 106800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:45:52,762-Speed 4776.36 samples/sec Loss 2.0565 Epoch: 21 Global Step: 106850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:46:03,434-Speed 4797.67 samples/sec Loss 2.0352 Epoch: 21 Global Step: 106900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:46:14,096-Speed 4802.24 samples/sec Loss 2.0518 Epoch: 21 Global Step: 106950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:46:24,674-Speed 4840.36 samples/sec Loss 2.0629 Epoch: 21 Global Step: 107000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:46:35,499-Speed 4730.22 samples/sec Loss 2.0025 Epoch: 21 Global Step: 107050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:46:46,359-Speed 4714.54 samples/sec Loss 2.0183 Epoch: 21 Global Step: 107100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:46:56,982-Speed 4820.03 samples/sec Loss 2.0428 Epoch: 21 Global Step: 107150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:47:08,578-Speed 4415.56 samples/sec Loss 2.0521 Epoch: 21 Global Step: 107200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:47:19,045-Speed 4891.40 samples/sec Loss 2.0045 Epoch: 21 Global Step: 107250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:47:29,702-Speed 4804.63 samples/sec Loss 2.0392 Epoch: 21 Global Step: 107300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:47:40,284-Speed 4838.95 samples/sec Loss 2.0170 Epoch: 21 Global Step: 107350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:47:50,774-Speed 4881.19 samples/sec Loss 2.0206 Epoch: 21 Global Step: 107400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:48:01,274-Speed 4876.40 samples/sec Loss 2.0153 Epoch: 21 Global Step: 107450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:48:12,198-Speed 4687.03 samples/sec Loss 2.0349 Epoch: 21 Global Step: 107500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:48:23,156-Speed 4672.70 samples/sec Loss 2.0384 Epoch: 21 Global Step: 107550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:48:33,685-Speed 4863.08 samples/sec Loss 2.0319 Epoch: 21 Global Step: 107600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:48:44,555-Speed 4710.41 samples/sec Loss 2.0478 Epoch: 21 Global Step: 107650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:48:55,210-Speed 4805.43 samples/sec Loss 2.0240 Epoch: 21 Global Step: 107700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:49:06,135-Speed 4686.77 samples/sec Loss 2.0677 Epoch: 21 Global Step: 107750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:49:16,963-Speed 4728.66 samples/sec Loss 2.0206 Epoch: 21 Global Step: 107800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:49:27,760-Speed 4742.27 samples/sec Loss 2.0226 Epoch: 21 Global Step: 107850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:49:38,525-Speed 4756.35 samples/sec Loss 2.0292 Epoch: 21 Global Step: 107900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:49:49,322-Speed 4742.24 samples/sec Loss 2.0209 Epoch: 21 Global Step: 107950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:50:00,033-Speed 4780.25 samples/sec Loss 1.9903 Epoch: 21 Global Step: 108000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:50:23,275-[lfw][108000]XNorm: 22.702981 Training: 2021-03-17 04:50:23,275-[lfw][108000]Accuracy-Flip: 0.99750+-0.00291 Training: 2021-03-17 04:50:23,275-[lfw][108000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:50:49,955-[cfp_fp][108000]XNorm: 20.471198 Training: 2021-03-17 04:50:49,955-[cfp_fp][108000]Accuracy-Flip: 0.98543+-0.00446 Training: 2021-03-17 04:50:49,955-[cfp_fp][108000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:51:12,875-[agedb_30][108000]XNorm: 22.383333 Training: 2021-03-17 04:51:12,875-[agedb_30][108000]Accuracy-Flip: 0.98100+-0.00461 Training: 2021-03-17 04:51:12,875-[agedb_30][108000]Accuracy-Highest: 0.98100 Training: 2021-03-17 04:51:23,376-Speed 614.33 samples/sec Loss 1.9752 Epoch: 21 Global Step: 108050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:51:33,974-Speed 4831.40 samples/sec Loss 2.0177 Epoch: 21 Global Step: 108100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:51:44,598-Speed 4819.64 samples/sec Loss 2.0480 Epoch: 21 Global Step: 108150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:51:55,366-Speed 4754.91 samples/sec Loss 2.0163 Epoch: 21 Global Step: 108200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:52:05,840-Speed 4888.61 samples/sec Loss 2.0443 Epoch: 21 Global Step: 108250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:52:16,328-Speed 4881.94 samples/sec Loss 2.0330 Epoch: 21 Global Step: 108300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:52:27,010-Speed 4793.17 samples/sec Loss 2.0435 Epoch: 21 Global Step: 108350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:52:37,719-Speed 4781.34 samples/sec Loss 2.0261 Epoch: 21 Global Step: 108400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:52:48,158-Speed 4904.58 samples/sec Loss 2.0346 Epoch: 21 Global Step: 108450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:52:58,767-Speed 4826.71 samples/sec Loss 2.0761 Epoch: 21 Global Step: 108500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:53:10,125-Speed 4508.02 samples/sec Loss 2.0033 Epoch: 21 Global Step: 108550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:53:20,854-Speed 4772.08 samples/sec Loss 2.0120 Epoch: 21 Global Step: 108600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:53:31,459-Speed 4828.00 samples/sec Loss 2.0095 Epoch: 21 Global Step: 108650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:53:42,262-Speed 4739.70 samples/sec Loss 2.0558 Epoch: 21 Global Step: 108700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:53:53,078-Speed 4733.75 samples/sec Loss 2.0591 Epoch: 21 Global Step: 108750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:54:03,603-Speed 4864.77 samples/sec Loss 2.0408 Epoch: 21 Global Step: 108800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:54:14,065-Speed 4894.17 samples/sec Loss 2.0218 Epoch: 21 Global Step: 108850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:54:24,563-Speed 4877.69 samples/sec Loss 2.0360 Epoch: 21 Global Step: 108900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:54:35,071-Speed 4872.57 samples/sec Loss 2.0247 Epoch: 21 Global Step: 108950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:54:45,802-Speed 4771.40 samples/sec Loss 2.0445 Epoch: 21 Global Step: 109000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:54:56,463-Speed 4802.47 samples/sec Loss 2.0412 Epoch: 21 Global Step: 109050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:55:07,176-Speed 4779.72 samples/sec Loss 2.0569 Epoch: 21 Global Step: 109100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:55:18,101-Speed 4686.73 samples/sec Loss 2.0284 Epoch: 21 Global Step: 109150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:55:28,683-Speed 4838.62 samples/sec Loss 2.0635 Epoch: 21 Global Step: 109200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:55:39,209-Speed 4864.40 samples/sec Loss 2.0530 Epoch: 21 Global Step: 109250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:55:49,956-Speed 4764.31 samples/sec Loss 2.0291 Epoch: 21 Global Step: 109300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:56:01,003-Speed 4634.71 samples/sec Loss 2.0526 Epoch: 21 Global Step: 109350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:56:12,377-Speed 4501.71 samples/sec Loss 2.0217 Epoch: 21 Global Step: 109400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:56:22,872-Speed 4878.90 samples/sec Loss 2.0413 Epoch: 21 Global Step: 109450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:56:33,741-Speed 4710.71 samples/sec Loss 2.0459 Epoch: 21 Global Step: 109500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:56:44,331-Speed 4835.09 samples/sec Loss 2.0373 Epoch: 21 Global Step: 109550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:56:55,076-Speed 4765.18 samples/sec Loss 2.0583 Epoch: 21 Global Step: 109600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:57:08,516-Speed 3809.81 samples/sec Loss 1.9454 Epoch: 22 Global Step: 109650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:57:19,119-Speed 4829.20 samples/sec Loss 1.9691 Epoch: 22 Global Step: 109700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:57:30,012-Speed 4700.63 samples/sec Loss 1.9510 Epoch: 22 Global Step: 109750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:57:41,285-Speed 4541.93 samples/sec Loss 1.9535 Epoch: 22 Global Step: 109800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:57:52,009-Speed 4774.69 samples/sec Loss 1.9987 Epoch: 22 Global Step: 109850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:58:02,629-Speed 4821.70 samples/sec Loss 1.9849 Epoch: 22 Global Step: 109900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:58:13,347-Speed 4777.35 samples/sec Loss 1.9902 Epoch: 22 Global Step: 109950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:58:24,026-Speed 4794.62 samples/sec Loss 1.9583 Epoch: 22 Global Step: 110000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:58:47,042-[lfw][110000]XNorm: 22.667868 Training: 2021-03-17 04:58:47,042-[lfw][110000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 04:58:47,042-[lfw][110000]Accuracy-Highest: 0.99800 Training: 2021-03-17 04:59:13,592-[cfp_fp][110000]XNorm: 20.454211 Training: 2021-03-17 04:59:13,593-[cfp_fp][110000]Accuracy-Flip: 0.98443+-0.00525 Training: 2021-03-17 04:59:13,593-[cfp_fp][110000]Accuracy-Highest: 0.98643 Training: 2021-03-17 04:59:36,579-[agedb_30][110000]XNorm: 22.380643 Training: 2021-03-17 04:59:36,579-[agedb_30][110000]Accuracy-Flip: 0.97900+-0.00549 Training: 2021-03-17 04:59:36,579-[agedb_30][110000]Accuracy-Highest: 0.98100 Training: 2021-03-17 04:59:47,159-Speed 615.88 samples/sec Loss 1.9665 Epoch: 22 Global Step: 110050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 04:59:57,734-Speed 4841.69 samples/sec Loss 1.9983 Epoch: 22 Global Step: 110100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:00:08,568-Speed 4725.92 samples/sec Loss 2.0020 Epoch: 22 Global Step: 110150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:00:19,111-Speed 4856.61 samples/sec Loss 2.0026 Epoch: 22 Global Step: 110200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:00:29,814-Speed 4783.70 samples/sec Loss 2.0065 Epoch: 22 Global Step: 110250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:00:40,473-Speed 4803.97 samples/sec Loss 1.9652 Epoch: 22 Global Step: 110300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:00:50,976-Speed 4874.82 samples/sec Loss 1.9580 Epoch: 22 Global Step: 110350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:01:01,536-Speed 4849.00 samples/sec Loss 1.9865 Epoch: 22 Global Step: 110400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:01:12,117-Speed 4839.10 samples/sec Loss 1.9825 Epoch: 22 Global Step: 110450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:01:22,753-Speed 4814.15 samples/sec Loss 2.0018 Epoch: 22 Global Step: 110500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:01:33,573-Speed 4731.89 samples/sec Loss 1.9750 Epoch: 22 Global Step: 110550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:01:44,346-Speed 4752.66 samples/sec Loss 1.9948 Epoch: 22 Global Step: 110600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:01:55,205-Speed 4715.58 samples/sec Loss 2.0050 Epoch: 22 Global Step: 110650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:02:05,812-Speed 4827.12 samples/sec Loss 1.9614 Epoch: 22 Global Step: 110700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:02:16,353-Speed 4857.52 samples/sec Loss 1.9416 Epoch: 22 Global Step: 110750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:02:27,145-Speed 4744.23 samples/sec Loss 1.9460 Epoch: 22 Global Step: 110800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:02:37,722-Speed 4841.12 samples/sec Loss 1.9872 Epoch: 22 Global Step: 110850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:02:48,444-Speed 4775.20 samples/sec Loss 1.9364 Epoch: 22 Global Step: 110900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:02:59,130-Speed 4791.47 samples/sec Loss 1.9730 Epoch: 22 Global Step: 110950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:03:09,787-Speed 4804.48 samples/sec Loss 1.9886 Epoch: 22 Global Step: 111000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:03:20,316-Speed 4863.02 samples/sec Loss 1.9815 Epoch: 22 Global Step: 111050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:03:30,852-Speed 4859.55 samples/sec Loss 1.9685 Epoch: 22 Global Step: 111100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:03:41,561-Speed 4781.42 samples/sec Loss 1.9869 Epoch: 22 Global Step: 111150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:03:52,151-Speed 4835.16 samples/sec Loss 1.9785 Epoch: 22 Global Step: 111200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:04:03,195-Speed 4635.91 samples/sec Loss 1.9801 Epoch: 22 Global Step: 111250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:04:13,768-Speed 4842.85 samples/sec Loss 1.9982 Epoch: 22 Global Step: 111300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:04:24,649-Speed 4705.87 samples/sec Loss 1.9357 Epoch: 22 Global Step: 111350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:04:35,379-Speed 4771.47 samples/sec Loss 1.9771 Epoch: 22 Global Step: 111400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:04:46,188-Speed 4737.36 samples/sec Loss 1.9843 Epoch: 22 Global Step: 111450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:04:57,048-Speed 4714.39 samples/sec Loss 1.9816 Epoch: 22 Global Step: 111500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:05:08,001-Speed 4674.84 samples/sec Loss 1.9744 Epoch: 22 Global Step: 111550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:05:19,043-Speed 4636.96 samples/sec Loss 1.9999 Epoch: 22 Global Step: 111600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:05:30,260-Speed 4564.86 samples/sec Loss 1.9923 Epoch: 22 Global Step: 111650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:05:40,835-Speed 4841.75 samples/sec Loss 1.9950 Epoch: 22 Global Step: 111700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:05:51,465-Speed 4816.48 samples/sec Loss 1.9766 Epoch: 22 Global Step: 111750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:06:02,067-Speed 4829.54 samples/sec Loss 2.0011 Epoch: 22 Global Step: 111800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:06:12,725-Speed 4804.10 samples/sec Loss 2.0278 Epoch: 22 Global Step: 111850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:06:23,399-Speed 4797.11 samples/sec Loss 1.9662 Epoch: 22 Global Step: 111900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:06:33,986-Speed 4836.20 samples/sec Loss 1.9734 Epoch: 22 Global Step: 111950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:06:45,083-Speed 4613.92 samples/sec Loss 1.9767 Epoch: 22 Global Step: 112000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:07:08,225-[lfw][112000]XNorm: 22.681202 Training: 2021-03-17 05:07:08,226-[lfw][112000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 05:07:08,226-[lfw][112000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:07:35,047-[cfp_fp][112000]XNorm: 20.489280 Training: 2021-03-17 05:07:35,047-[cfp_fp][112000]Accuracy-Flip: 0.98429+-0.00542 Training: 2021-03-17 05:07:35,047-[cfp_fp][112000]Accuracy-Highest: 0.98643 Training: 2021-03-17 05:07:58,217-[agedb_30][112000]XNorm: 22.406748 Training: 2021-03-17 05:07:58,217-[agedb_30][112000]Accuracy-Flip: 0.97850+-0.00575 Training: 2021-03-17 05:07:58,217-[agedb_30][112000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:08:08,701-Speed 612.32 samples/sec Loss 2.0221 Epoch: 22 Global Step: 112050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:08:19,242-Speed 4857.26 samples/sec Loss 1.9784 Epoch: 22 Global Step: 112100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:08:29,751-Speed 4872.46 samples/sec Loss 2.0089 Epoch: 22 Global Step: 112150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:08:40,372-Speed 4820.64 samples/sec Loss 2.0101 Epoch: 22 Global Step: 112200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:08:50,884-Speed 4871.21 samples/sec Loss 2.0027 Epoch: 22 Global Step: 112250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:09:01,452-Speed 4844.91 samples/sec Loss 1.9414 Epoch: 22 Global Step: 112300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:09:12,147-Speed 4787.39 samples/sec Loss 2.0091 Epoch: 22 Global Step: 112350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:09:22,621-Speed 4888.81 samples/sec Loss 1.9778 Epoch: 22 Global Step: 112400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:09:33,258-Speed 4813.45 samples/sec Loss 2.0051 Epoch: 22 Global Step: 112450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:09:43,840-Speed 4838.37 samples/sec Loss 1.9934 Epoch: 22 Global Step: 112500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:09:54,602-Speed 4757.98 samples/sec Loss 2.0005 Epoch: 22 Global Step: 112550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:10:05,066-Speed 4893.17 samples/sec Loss 1.9867 Epoch: 22 Global Step: 112600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:10:15,943-Speed 4707.19 samples/sec Loss 1.9884 Epoch: 22 Global Step: 112650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:10:26,804-Speed 4714.10 samples/sec Loss 2.0306 Epoch: 22 Global Step: 112700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:10:37,470-Speed 4800.61 samples/sec Loss 1.9977 Epoch: 22 Global Step: 112750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:10:48,368-Speed 4698.43 samples/sec Loss 2.0234 Epoch: 22 Global Step: 112800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:10:59,105-Speed 4768.86 samples/sec Loss 2.0144 Epoch: 22 Global Step: 112850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:11:09,754-Speed 4807.80 samples/sec Loss 2.0339 Epoch: 22 Global Step: 112900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:11:20,628-Speed 4708.85 samples/sec Loss 2.0341 Epoch: 22 Global Step: 112950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:11:31,502-Speed 4708.74 samples/sec Loss 1.9809 Epoch: 22 Global Step: 113000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:11:42,172-Speed 4798.37 samples/sec Loss 1.9741 Epoch: 22 Global Step: 113050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:11:52,812-Speed 4812.63 samples/sec Loss 2.0411 Epoch: 22 Global Step: 113100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:12:03,497-Speed 4791.91 samples/sec Loss 1.9961 Epoch: 22 Global Step: 113150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:12:14,228-Speed 4771.22 samples/sec Loss 1.9820 Epoch: 22 Global Step: 113200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:12:24,948-Speed 4776.33 samples/sec Loss 2.0061 Epoch: 22 Global Step: 113250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:12:35,736-Speed 4746.22 samples/sec Loss 2.0261 Epoch: 22 Global Step: 113300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:12:46,387-Speed 4807.53 samples/sec Loss 1.9901 Epoch: 22 Global Step: 113350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:12:57,235-Speed 4720.00 samples/sec Loss 2.0018 Epoch: 22 Global Step: 113400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:13:07,961-Speed 4773.54 samples/sec Loss 1.9860 Epoch: 22 Global Step: 113450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:13:18,748-Speed 4746.63 samples/sec Loss 2.0191 Epoch: 22 Global Step: 113500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:13:29,375-Speed 4817.95 samples/sec Loss 1.9866 Epoch: 22 Global Step: 113550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:13:39,934-Speed 4849.38 samples/sec Loss 1.9598 Epoch: 22 Global Step: 113600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:13:50,748-Speed 4734.87 samples/sec Loss 1.9928 Epoch: 22 Global Step: 113650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:14:01,299-Speed 4852.48 samples/sec Loss 2.0160 Epoch: 22 Global Step: 113700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:14:12,213-Speed 4691.46 samples/sec Loss 2.0133 Epoch: 22 Global Step: 113750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:14:23,355-Speed 4595.69 samples/sec Loss 1.9988 Epoch: 22 Global Step: 113800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:14:34,160-Speed 4738.66 samples/sec Loss 1.9944 Epoch: 22 Global Step: 113850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:14:45,111-Speed 4675.35 samples/sec Loss 2.0164 Epoch: 22 Global Step: 113900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:14:55,821-Speed 4780.88 samples/sec Loss 2.0293 Epoch: 22 Global Step: 113950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:15:06,546-Speed 4774.24 samples/sec Loss 1.9729 Epoch: 22 Global Step: 114000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:15:29,808-[lfw][114000]XNorm: 22.637199 Training: 2021-03-17 05:15:29,808-[lfw][114000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 05:15:29,808-[lfw][114000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:15:56,687-[cfp_fp][114000]XNorm: 20.430640 Training: 2021-03-17 05:15:56,687-[cfp_fp][114000]Accuracy-Flip: 0.98671+-0.00362 Training: 2021-03-17 05:15:56,687-[cfp_fp][114000]Accuracy-Highest: 0.98671 Training: 2021-03-17 05:16:19,946-[agedb_30][114000]XNorm: 22.369600 Training: 2021-03-17 05:16:19,946-[agedb_30][114000]Accuracy-Flip: 0.97800+-0.00526 Training: 2021-03-17 05:16:19,946-[agedb_30][114000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:16:30,362-Speed 610.86 samples/sec Loss 2.0190 Epoch: 22 Global Step: 114050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:16:40,867-Speed 4874.10 samples/sec Loss 2.0064 Epoch: 22 Global Step: 114100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:16:51,604-Speed 4768.54 samples/sec Loss 2.0014 Epoch: 22 Global Step: 114150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:17:02,591-Speed 4660.35 samples/sec Loss 2.0070 Epoch: 22 Global Step: 114200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:17:12,968-Speed 4934.45 samples/sec Loss 2.0147 Epoch: 22 Global Step: 114250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:17:23,577-Speed 4826.14 samples/sec Loss 1.9932 Epoch: 22 Global Step: 114300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:17:34,376-Speed 4741.22 samples/sec Loss 2.0151 Epoch: 22 Global Step: 114350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:17:45,071-Speed 4787.59 samples/sec Loss 1.9612 Epoch: 22 Global Step: 114400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:17:55,857-Speed 4747.21 samples/sec Loss 2.0506 Epoch: 22 Global Step: 114450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:18:06,380-Speed 4865.52 samples/sec Loss 2.0551 Epoch: 22 Global Step: 114500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:18:16,919-Speed 4858.36 samples/sec Loss 2.0005 Epoch: 22 Global Step: 114550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:18:30,687-Speed 3718.94 samples/sec Loss 1.9982 Epoch: 23 Global Step: 114600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:18:41,437-Speed 4763.46 samples/sec Loss 1.9368 Epoch: 23 Global Step: 114650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:18:52,216-Speed 4750.36 samples/sec Loss 1.9674 Epoch: 23 Global Step: 114700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:19:03,046-Speed 4727.85 samples/sec Loss 1.9519 Epoch: 23 Global Step: 114750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:19:13,879-Speed 4726.72 samples/sec Loss 1.9780 Epoch: 23 Global Step: 114800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:19:24,651-Speed 4753.35 samples/sec Loss 1.9448 Epoch: 23 Global Step: 114850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:19:35,327-Speed 4795.94 samples/sec Loss 1.9290 Epoch: 23 Global Step: 114900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:19:46,205-Speed 4706.99 samples/sec Loss 1.9767 Epoch: 23 Global Step: 114950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:19:57,322-Speed 4605.88 samples/sec Loss 1.9656 Epoch: 23 Global Step: 115000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:20:07,995-Speed 4797.21 samples/sec Loss 1.9348 Epoch: 23 Global Step: 115050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:20:18,651-Speed 4804.85 samples/sec Loss 1.9528 Epoch: 23 Global Step: 115100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:20:29,496-Speed 4721.59 samples/sec Loss 1.9725 Epoch: 23 Global Step: 115150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:20:40,198-Speed 4783.90 samples/sec Loss 1.9345 Epoch: 23 Global Step: 115200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:20:51,010-Speed 4735.94 samples/sec Loss 1.9473 Epoch: 23 Global Step: 115250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:21:01,577-Speed 4845.59 samples/sec Loss 1.9452 Epoch: 23 Global Step: 115300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:21:12,020-Speed 4903.04 samples/sec Loss 1.9515 Epoch: 23 Global Step: 115350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:21:22,748-Speed 4772.76 samples/sec Loss 1.9633 Epoch: 23 Global Step: 115400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:21:33,516-Speed 4754.80 samples/sec Loss 1.9556 Epoch: 23 Global Step: 115450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:21:44,110-Speed 4833.32 samples/sec Loss 1.9380 Epoch: 23 Global Step: 115500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:21:55,121-Speed 4650.10 samples/sec Loss 1.9721 Epoch: 23 Global Step: 115550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:22:05,723-Speed 4829.43 samples/sec Loss 1.9930 Epoch: 23 Global Step: 115600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:22:16,253-Speed 4862.64 samples/sec Loss 1.9660 Epoch: 23 Global Step: 115650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:22:26,996-Speed 4766.04 samples/sec Loss 1.9719 Epoch: 23 Global Step: 115700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:22:37,704-Speed 4781.68 samples/sec Loss 1.9328 Epoch: 23 Global Step: 115750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:22:48,317-Speed 4824.55 samples/sec Loss 1.9642 Epoch: 23 Global Step: 115800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:22:59,170-Speed 4717.53 samples/sec Loss 1.9541 Epoch: 23 Global Step: 115850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:23:09,890-Speed 4776.48 samples/sec Loss 1.9241 Epoch: 23 Global Step: 115900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:23:20,550-Speed 4803.17 samples/sec Loss 1.9747 Epoch: 23 Global Step: 115950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:23:31,713-Speed 4586.64 samples/sec Loss 1.9757 Epoch: 23 Global Step: 116000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:23:54,905-[lfw][116000]XNorm: 22.712532 Training: 2021-03-17 05:23:54,905-[lfw][116000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 05:23:54,905-[lfw][116000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:24:21,687-[cfp_fp][116000]XNorm: 20.512749 Training: 2021-03-17 05:24:21,687-[cfp_fp][116000]Accuracy-Flip: 0.98557+-0.00488 Training: 2021-03-17 05:24:21,687-[cfp_fp][116000]Accuracy-Highest: 0.98671 Training: 2021-03-17 05:24:44,807-[agedb_30][116000]XNorm: 22.466284 Training: 2021-03-17 05:24:44,807-[agedb_30][116000]Accuracy-Flip: 0.97933+-0.00528 Training: 2021-03-17 05:24:44,808-[agedb_30][116000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:24:55,529-Speed 610.87 samples/sec Loss 1.9787 Epoch: 23 Global Step: 116050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:25:06,869-Speed 4515.03 samples/sec Loss 1.9345 Epoch: 23 Global Step: 116100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:25:17,559-Speed 4789.85 samples/sec Loss 1.9755 Epoch: 23 Global Step: 116150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:25:28,161-Speed 4829.56 samples/sec Loss 1.9500 Epoch: 23 Global Step: 116200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:25:38,816-Speed 4805.28 samples/sec Loss 1.9506 Epoch: 23 Global Step: 116250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:25:49,407-Speed 4834.49 samples/sec Loss 1.9495 Epoch: 23 Global Step: 116300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:26:00,230-Speed 4731.10 samples/sec Loss 1.9435 Epoch: 23 Global Step: 116350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:26:11,108-Speed 4706.98 samples/sec Loss 1.9422 Epoch: 23 Global Step: 116400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:26:21,726-Speed 4822.03 samples/sec Loss 1.9310 Epoch: 23 Global Step: 116450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:26:32,388-Speed 4802.34 samples/sec Loss 1.9256 Epoch: 23 Global Step: 116500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:26:43,091-Speed 4783.65 samples/sec Loss 1.9828 Epoch: 23 Global Step: 116550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:26:53,595-Speed 4874.65 samples/sec Loss 1.9833 Epoch: 23 Global Step: 116600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:27:04,104-Speed 4872.23 samples/sec Loss 1.9783 Epoch: 23 Global Step: 116650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:27:14,692-Speed 4836.18 samples/sec Loss 1.9541 Epoch: 23 Global Step: 116700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:27:25,061-Speed 4938.03 samples/sec Loss 1.9581 Epoch: 23 Global Step: 116750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:27:35,810-Speed 4763.30 samples/sec Loss 1.9738 Epoch: 23 Global Step: 116800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:27:46,559-Speed 4763.32 samples/sec Loss 1.9389 Epoch: 23 Global Step: 116850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:27:57,357-Speed 4742.04 samples/sec Loss 1.9787 Epoch: 23 Global Step: 116900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:28:08,106-Speed 4763.32 samples/sec Loss 1.9313 Epoch: 23 Global Step: 116950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:28:18,792-Speed 4791.48 samples/sec Loss 1.9597 Epoch: 23 Global Step: 117000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:28:29,908-Speed 4606.05 samples/sec Loss 1.9593 Epoch: 23 Global Step: 117050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:28:40,437-Speed 4863.10 samples/sec Loss 1.9779 Epoch: 23 Global Step: 117100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:28:51,177-Speed 4767.45 samples/sec Loss 1.9757 Epoch: 23 Global Step: 117150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:29:01,981-Speed 4739.18 samples/sec Loss 1.9569 Epoch: 23 Global Step: 117200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:29:12,843-Speed 4713.87 samples/sec Loss 1.9949 Epoch: 23 Global Step: 117250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:29:23,478-Speed 4814.77 samples/sec Loss 1.9497 Epoch: 23 Global Step: 117300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2021-03-17 05:29:34,445-Speed 4668.60 samples/sec Loss 1.9564 Epoch: 23 Global Step: 117350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:29:45,063-Speed 4822.03 samples/sec Loss 1.9686 Epoch: 23 Global Step: 117400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:29:56,126-Speed 4628.58 samples/sec Loss 1.9817 Epoch: 23 Global Step: 117450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:30:06,901-Speed 4751.82 samples/sec Loss 2.0172 Epoch: 23 Global Step: 117500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:30:17,638-Speed 4768.84 samples/sec Loss 1.9728 Epoch: 23 Global Step: 117550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:30:28,490-Speed 4717.88 samples/sec Loss 1.9693 Epoch: 23 Global Step: 117600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:30:39,225-Speed 4769.78 samples/sec Loss 1.9846 Epoch: 23 Global Step: 117650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:30:49,788-Speed 4847.52 samples/sec Loss 1.9745 Epoch: 23 Global Step: 117700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:31:00,413-Speed 4818.79 samples/sec Loss 1.9837 Epoch: 23 Global Step: 117750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:31:11,039-Speed 4818.40 samples/sec Loss 1.9973 Epoch: 23 Global Step: 117800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:31:21,658-Speed 4821.94 samples/sec Loss 1.9397 Epoch: 23 Global Step: 117850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:31:32,359-Speed 4784.82 samples/sec Loss 1.9616 Epoch: 23 Global Step: 117900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:31:43,020-Speed 4802.59 samples/sec Loss 1.9758 Epoch: 23 Global Step: 117950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:31:53,551-Speed 4862.19 samples/sec Loss 1.9547 Epoch: 23 Global Step: 118000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:32:16,693-[lfw][118000]XNorm: 22.604514 Training: 2021-03-17 05:32:16,693-[lfw][118000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 05:32:16,693-[lfw][118000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:32:43,379-[cfp_fp][118000]XNorm: 20.423045 Training: 2021-03-17 05:32:43,379-[cfp_fp][118000]Accuracy-Flip: 0.98614+-0.00429 Training: 2021-03-17 05:32:43,379-[cfp_fp][118000]Accuracy-Highest: 0.98671 Training: 2021-03-17 05:33:06,526-[agedb_30][118000]XNorm: 22.363591 Training: 2021-03-17 05:33:06,526-[agedb_30][118000]Accuracy-Flip: 0.97867+-0.00572 Training: 2021-03-17 05:33:06,526-[agedb_30][118000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:33:17,000-Speed 613.55 samples/sec Loss 1.9798 Epoch: 23 Global Step: 118050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:33:27,574-Speed 4842.41 samples/sec Loss 1.9782 Epoch: 23 Global Step: 118100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:33:38,560-Speed 4660.53 samples/sec Loss 1.9675 Epoch: 23 Global Step: 118150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:33:49,125-Speed 4846.26 samples/sec Loss 1.9875 Epoch: 23 Global Step: 118200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:34:00,170-Speed 4635.99 samples/sec Loss 1.9806 Epoch: 23 Global Step: 118250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:34:11,284-Speed 4607.13 samples/sec Loss 1.9510 Epoch: 23 Global Step: 118300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:34:21,674-Speed 4928.04 samples/sec Loss 1.9628 Epoch: 23 Global Step: 118350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:34:32,143-Speed 4890.72 samples/sec Loss 1.9703 Epoch: 23 Global Step: 118400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:34:42,535-Speed 4927.02 samples/sec Loss 1.9763 Epoch: 23 Global Step: 118450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:34:53,202-Speed 4800.34 samples/sec Loss 1.9562 Epoch: 23 Global Step: 118500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:35:04,273-Speed 4625.12 samples/sec Loss 1.9723 Epoch: 23 Global Step: 118550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:35:15,273-Speed 4654.63 samples/sec Loss 1.9760 Epoch: 23 Global Step: 118600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:35:25,767-Speed 4879.23 samples/sec Loss 1.9751 Epoch: 23 Global Step: 118650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:35:36,324-Speed 4849.86 samples/sec Loss 1.9836 Epoch: 23 Global Step: 118700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:35:46,983-Speed 4803.64 samples/sec Loss 1.9679 Epoch: 23 Global Step: 118750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:35:57,696-Speed 4779.47 samples/sec Loss 2.0075 Epoch: 23 Global Step: 118800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:36:08,363-Speed 4800.43 samples/sec Loss 1.9874 Epoch: 23 Global Step: 118850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:36:19,126-Speed 4756.93 samples/sec Loss 1.9812 Epoch: 23 Global Step: 118900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:36:29,717-Speed 4834.54 samples/sec Loss 1.9826 Epoch: 23 Global Step: 118950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:36:40,469-Speed 4762.41 samples/sec Loss 1.9905 Epoch: 23 Global Step: 119000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:36:51,279-Speed 4736.41 samples/sec Loss 1.9904 Epoch: 23 Global Step: 119050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:37:02,092-Speed 4735.41 samples/sec Loss 1.9785 Epoch: 23 Global Step: 119100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:37:12,741-Speed 4807.73 samples/sec Loss 1.9715 Epoch: 23 Global Step: 119150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:37:23,542-Speed 4740.52 samples/sec Loss 1.9847 Epoch: 23 Global Step: 119200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:37:34,255-Speed 4779.52 samples/sec Loss 1.9823 Epoch: 23 Global Step: 119250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:37:44,695-Speed 4904.61 samples/sec Loss 1.9909 Epoch: 23 Global Step: 119300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:37:55,246-Speed 4852.74 samples/sec Loss 1.9737 Epoch: 23 Global Step: 119350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:38:05,810-Speed 4846.90 samples/sec Loss 1.9534 Epoch: 23 Global Step: 119400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:38:16,523-Speed 4779.61 samples/sec Loss 2.0070 Epoch: 23 Global Step: 119450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:38:27,043-Speed 4866.83 samples/sec Loss 1.9548 Epoch: 23 Global Step: 119500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:38:37,807-Speed 4757.01 samples/sec Loss 1.9912 Epoch: 23 Global Step: 119550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:38:51,421-Speed 3760.77 samples/sec Loss 1.9543 Epoch: 24 Global Step: 119600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:39:02,485-Speed 4628.19 samples/sec Loss 1.9422 Epoch: 24 Global Step: 119650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:39:13,292-Speed 4738.05 samples/sec Loss 1.9302 Epoch: 24 Global Step: 119700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:39:24,141-Speed 4719.43 samples/sec Loss 1.9252 Epoch: 24 Global Step: 119750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:39:34,847-Speed 4782.53 samples/sec Loss 1.9315 Epoch: 24 Global Step: 119800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:39:45,331-Speed 4884.06 samples/sec Loss 1.9234 Epoch: 24 Global Step: 119850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:39:56,242-Speed 4692.62 samples/sec Loss 1.8958 Epoch: 24 Global Step: 119900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:40:06,763-Speed 4866.91 samples/sec Loss 1.9654 Epoch: 24 Global Step: 119950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:40:17,593-Speed 4727.87 samples/sec Loss 1.9697 Epoch: 24 Global Step: 120000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:40:40,773-[lfw][120000]XNorm: 22.654629 Training: 2021-03-17 05:40:40,774-[lfw][120000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 05:40:40,774-[lfw][120000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:41:07,540-[cfp_fp][120000]XNorm: 20.439318 Training: 2021-03-17 05:41:07,540-[cfp_fp][120000]Accuracy-Flip: 0.98643+-0.00420 Training: 2021-03-17 05:41:07,541-[cfp_fp][120000]Accuracy-Highest: 0.98671 Training: 2021-03-17 05:41:30,583-[agedb_30][120000]XNorm: 22.369045 Training: 2021-03-17 05:41:30,583-[agedb_30][120000]Accuracy-Flip: 0.97883+-0.00568 Training: 2021-03-17 05:41:30,583-[agedb_30][120000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:41:40,996-Speed 613.88 samples/sec Loss 1.9566 Epoch: 24 Global Step: 120050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:41:51,589-Speed 4834.02 samples/sec Loss 1.8966 Epoch: 24 Global Step: 120100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:42:02,217-Speed 4817.58 samples/sec Loss 1.9169 Epoch: 24 Global Step: 120150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:42:13,044-Speed 4729.40 samples/sec Loss 1.9345 Epoch: 24 Global Step: 120200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:42:23,678-Speed 4814.58 samples/sec Loss 1.8902 Epoch: 24 Global Step: 120250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:42:34,233-Speed 4851.30 samples/sec Loss 1.9339 Epoch: 24 Global Step: 120300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:42:44,604-Speed 4936.80 samples/sec Loss 1.8959 Epoch: 24 Global Step: 120350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:42:55,262-Speed 4804.11 samples/sec Loss 1.9346 Epoch: 24 Global Step: 120400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:43:07,086-Speed 4330.51 samples/sec Loss 1.9018 Epoch: 24 Global Step: 120450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:43:18,087-Speed 4654.27 samples/sec Loss 1.9359 Epoch: 24 Global Step: 120500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:43:29,387-Speed 4531.17 samples/sec Loss 1.9514 Epoch: 24 Global Step: 120550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:43:40,146-Speed 4758.89 samples/sec Loss 1.9397 Epoch: 24 Global Step: 120600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:43:50,898-Speed 4762.06 samples/sec Loss 1.9272 Epoch: 24 Global Step: 120650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:44:01,527-Speed 4817.50 samples/sec Loss 1.9555 Epoch: 24 Global Step: 120700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:44:12,304-Speed 4750.89 samples/sec Loss 1.9088 Epoch: 24 Global Step: 120750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:44:23,087-Speed 4748.52 samples/sec Loss 1.9428 Epoch: 24 Global Step: 120800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:44:33,652-Speed 4846.44 samples/sec Loss 1.9654 Epoch: 24 Global Step: 120850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:44:44,860-Speed 4568.35 samples/sec Loss 1.9261 Epoch: 24 Global Step: 120900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:44:55,385-Speed 4864.48 samples/sec Loss 1.9526 Epoch: 24 Global Step: 120950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:45:06,088-Speed 4784.14 samples/sec Loss 1.9168 Epoch: 24 Global Step: 121000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:45:16,806-Speed 4776.99 samples/sec Loss 1.9342 Epoch: 24 Global Step: 121050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:45:27,317-Speed 4871.39 samples/sec Loss 1.9474 Epoch: 24 Global Step: 121100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:45:38,151-Speed 4725.98 samples/sec Loss 1.9402 Epoch: 24 Global Step: 121150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:45:48,965-Speed 4734.97 samples/sec Loss 1.9523 Epoch: 24 Global Step: 121200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:45:59,535-Speed 4843.85 samples/sec Loss 1.9234 Epoch: 24 Global Step: 121250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:46:10,694-Speed 4588.73 samples/sec Loss 1.9209 Epoch: 24 Global Step: 121300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:46:21,394-Speed 4785.01 samples/sec Loss 1.9561 Epoch: 24 Global Step: 121350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:46:32,024-Speed 4816.89 samples/sec Loss 1.9686 Epoch: 24 Global Step: 121400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:46:42,722-Speed 4786.13 samples/sec Loss 1.9254 Epoch: 24 Global Step: 121450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:46:52,984-Speed 4989.40 samples/sec Loss 1.9380 Epoch: 24 Global Step: 121500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:47:03,517-Speed 4861.02 samples/sec Loss 1.9243 Epoch: 24 Global Step: 121550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:47:14,002-Speed 4883.60 samples/sec Loss 1.9657 Epoch: 24 Global Step: 121600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:47:24,848-Speed 4720.64 samples/sec Loss 1.9375 Epoch: 24 Global Step: 121650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:47:35,540-Speed 4788.97 samples/sec Loss 1.9409 Epoch: 24 Global Step: 121700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:47:46,409-Speed 4710.75 samples/sec Loss 1.9827 Epoch: 24 Global Step: 121750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:47:56,854-Speed 4902.11 samples/sec Loss 1.9353 Epoch: 24 Global Step: 121800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:48:07,359-Speed 4874.22 samples/sec Loss 1.9266 Epoch: 24 Global Step: 121850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:48:18,199-Speed 4723.29 samples/sec Loss 1.9449 Epoch: 24 Global Step: 121900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:48:28,707-Speed 4872.90 samples/sec Loss 1.9619 Epoch: 24 Global Step: 121950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:48:39,461-Speed 4761.25 samples/sec Loss 1.9388 Epoch: 24 Global Step: 122000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:49:02,691-[lfw][122000]XNorm: 22.647780 Training: 2021-03-17 05:49:02,692-[lfw][122000]Accuracy-Flip: 0.99783+-0.00248 Training: 2021-03-17 05:49:02,692-[lfw][122000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:49:29,495-[cfp_fp][122000]XNorm: 20.475877 Training: 2021-03-17 05:49:29,495-[cfp_fp][122000]Accuracy-Flip: 0.98543+-0.00423 Training: 2021-03-17 05:49:29,496-[cfp_fp][122000]Accuracy-Highest: 0.98671 Training: 2021-03-17 05:49:52,653-[agedb_30][122000]XNorm: 22.374388 Training: 2021-03-17 05:49:52,653-[agedb_30][122000]Accuracy-Flip: 0.97900+-0.00528 Training: 2021-03-17 05:49:52,653-[agedb_30][122000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:50:03,298-Speed 610.71 samples/sec Loss 1.9515 Epoch: 24 Global Step: 122050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:50:13,862-Speed 4846.78 samples/sec Loss 1.9739 Epoch: 24 Global Step: 122100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:50:24,430-Speed 4844.78 samples/sec Loss 1.9511 Epoch: 24 Global Step: 122150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:50:35,127-Speed 4786.78 samples/sec Loss 1.9520 Epoch: 24 Global Step: 122200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:50:45,635-Speed 4872.79 samples/sec Loss 1.9516 Epoch: 24 Global Step: 122250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:50:56,420-Speed 4747.55 samples/sec Loss 1.9306 Epoch: 24 Global Step: 122300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:51:07,322-Speed 4696.51 samples/sec Loss 1.9687 Epoch: 24 Global Step: 122350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:51:17,970-Speed 4808.42 samples/sec Loss 1.9607 Epoch: 24 Global Step: 122400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:51:28,450-Speed 4885.86 samples/sec Loss 1.9402 Epoch: 24 Global Step: 122450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:51:39,093-Speed 4810.93 samples/sec Loss 1.9849 Epoch: 24 Global Step: 122500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:51:49,696-Speed 4828.78 samples/sec Loss 1.9612 Epoch: 24 Global Step: 122550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:52:00,519-Speed 4730.82 samples/sec Loss 1.9216 Epoch: 24 Global Step: 122600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:52:11,104-Speed 4837.52 samples/sec Loss 1.9400 Epoch: 24 Global Step: 122650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:52:22,352-Speed 4552.06 samples/sec Loss 1.9524 Epoch: 24 Global Step: 122700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:52:33,052-Speed 4785.23 samples/sec Loss 1.9730 Epoch: 24 Global Step: 122750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:52:44,001-Speed 4676.51 samples/sec Loss 1.9446 Epoch: 24 Global Step: 122800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:52:54,805-Speed 4739.01 samples/sec Loss 1.9404 Epoch: 24 Global Step: 122850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:53:05,720-Speed 4691.11 samples/sec Loss 1.9262 Epoch: 24 Global Step: 122900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:53:16,400-Speed 4794.20 samples/sec Loss 1.9396 Epoch: 24 Global Step: 122950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:53:26,990-Speed 4834.99 samples/sec Loss 2.0033 Epoch: 24 Global Step: 123000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:53:37,702-Speed 4779.56 samples/sec Loss 1.9521 Epoch: 24 Global Step: 123050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:53:48,135-Speed 4907.93 samples/sec Loss 1.9221 Epoch: 24 Global Step: 123100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:53:58,500-Speed 4939.84 samples/sec Loss 1.9605 Epoch: 24 Global Step: 123150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:54:09,123-Speed 4819.89 samples/sec Loss 1.9008 Epoch: 24 Global Step: 123200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:54:19,930-Speed 4737.79 samples/sec Loss 1.9607 Epoch: 24 Global Step: 123250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:54:30,499-Speed 4844.85 samples/sec Loss 1.9756 Epoch: 24 Global Step: 123300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:54:41,095-Speed 4832.03 samples/sec Loss 1.9777 Epoch: 24 Global Step: 123350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:54:51,808-Speed 4779.76 samples/sec Loss 1.9639 Epoch: 24 Global Step: 123400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:55:02,635-Speed 4728.88 samples/sec Loss 1.9175 Epoch: 24 Global Step: 123450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:55:13,426-Speed 4745.10 samples/sec Loss 1.9512 Epoch: 24 Global Step: 123500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:55:24,091-Speed 4800.61 samples/sec Loss 1.9727 Epoch: 24 Global Step: 123550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:55:34,700-Speed 4826.42 samples/sec Loss 1.9639 Epoch: 24 Global Step: 123600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:55:45,504-Speed 4739.19 samples/sec Loss 1.9411 Epoch: 24 Global Step: 123650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:55:56,028-Speed 4865.28 samples/sec Loss 1.9402 Epoch: 24 Global Step: 123700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:56:06,242-Speed 5013.30 samples/sec Loss 1.9743 Epoch: 24 Global Step: 123750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:56:16,972-Speed 4771.53 samples/sec Loss 1.9429 Epoch: 24 Global Step: 123800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:56:27,591-Speed 4822.10 samples/sec Loss 1.9655 Epoch: 24 Global Step: 123850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:56:38,291-Speed 4785.28 samples/sec Loss 1.9777 Epoch: 24 Global Step: 123900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:56:48,864-Speed 4842.72 samples/sec Loss 1.9552 Epoch: 24 Global Step: 123950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:56:59,719-Speed 4716.77 samples/sec Loss 1.9704 Epoch: 24 Global Step: 124000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:57:22,763-[lfw][124000]XNorm: 22.665653 Training: 2021-03-17 05:57:22,763-[lfw][124000]Accuracy-Flip: 0.99767+-0.00281 Training: 2021-03-17 05:57:22,764-[lfw][124000]Accuracy-Highest: 0.99800 Training: 2021-03-17 05:57:49,730-[cfp_fp][124000]XNorm: 20.483496 Training: 2021-03-17 05:57:49,730-[cfp_fp][124000]Accuracy-Flip: 0.98443+-0.00426 Training: 2021-03-17 05:57:49,730-[cfp_fp][124000]Accuracy-Highest: 0.98671 Training: 2021-03-17 05:58:12,779-[agedb_30][124000]XNorm: 22.408847 Training: 2021-03-17 05:58:12,779-[agedb_30][124000]Accuracy-Flip: 0.97883+-0.00597 Training: 2021-03-17 05:58:12,779-[agedb_30][124000]Accuracy-Highest: 0.98100 Training: 2021-03-17 05:58:23,550-Speed 610.75 samples/sec Loss 1.9885 Epoch: 24 Global Step: 124050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:58:34,283-Speed 4770.46 samples/sec Loss 1.9707 Epoch: 24 Global Step: 124100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:58:45,059-Speed 4751.39 samples/sec Loss 1.9566 Epoch: 24 Global Step: 124150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:58:55,711-Speed 4806.92 samples/sec Loss 2.0010 Epoch: 24 Global Step: 124200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:59:06,534-Speed 4731.07 samples/sec Loss 1.9570 Epoch: 24 Global Step: 124250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:59:16,976-Speed 4903.20 samples/sec Loss 1.9929 Epoch: 24 Global Step: 124300 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:59:27,558-Speed 4838.85 samples/sec Loss 1.9454 Epoch: 24 Global Step: 124350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:59:38,302-Speed 4765.61 samples/sec Loss 1.9672 Epoch: 24 Global Step: 124400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:59:49,090-Speed 4746.10 samples/sec Loss 1.9835 Epoch: 24 Global Step: 124450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 05:59:59,788-Speed 4786.21 samples/sec Loss 1.9439 Epoch: 24 Global Step: 124500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2021-03-17 06:00:10,265-Speed 4887.25 samples/sec Loss 1.9919 Epoch: 24 Global Step: 124550 Fp16 Grad Scale: 16384 Required: -0 hours